This function uses the ART read simulator (Huang et al., Bioinformatics 2012) to generate synthetic sequencing reads from a reference FASTA file for multiple single cells in parallel. It supports both paired-end and single-end reads and allows for parallel execution across multiple cores. The function generates FASTQ files for each cell and logs the ART output.
simulate_read_sc(
fasta_input,
output_prefixes,
readLen = 100,
depth = 1,
artPath = "art_illumina",
seqSys = "HS25",
paired = TRUE,
numCores = 2,
otherArgs = ""
)Character string. Path to the input FASTA file.
Character vector. Prefixes for output files, one per simulated cell.
Numeric. Length of the simulated reads in base pairs. Default is 100.
Numeric. Sequencing coverage depth. Default is 1.
Character string. Path to the ART Illumina executable. Default is "art_illumina".
Character string. Sequencing system to simulate. Default is "HS25" (HiSeq 2500).
Logical. Whether to generate paired-end reads (TRUE) or single-end reads (FALSE). Default is TRUE.
Numeric. Number of cores to use for parallel processing. Default is 2.
Character string. Additional arguments to pass to ART Illumina. Default is "".
No return value, called for side effects of generating simulated read files and printing completion messages.
This function creates synthetic sequencing reads by calling ART Illumina for each output prefix in parallel using mclapply. For paired-end reads, it sets a mean fragment length of 200bp with a standard deviation of 10bp. Simulation output and errors are redirected to log files.
if (FALSE) { # \dontrun{
# Simulate reads for 3 cells with default parameters
simulate_read_sc(
fasta_input = "reference.fa",
output_prefixes = c("cell1_", "cell2_", "cell3_")
)
# Simulate single-end reads with custom parameters
simulate_read_sc(
fasta_input = "reference.fa",
output_prefixes = c("cell1_", "cell2_"),
readLen = 150,
depth = 0.01,
paired = FALSE,
...
)
} # }