Simulates sequencing reads for multiple single cells in batches by synthesizing cell-specific genomes with mutations and generating sequencing reads using the ART simulator. This function processes cells in batches to manage memory usage.

simulate_sc_dynamic_reads_for_batches(
  sampled_cell_idx,
  dynamics_ob,
  sc_founder_genomes,
  all_sampled_sc_mutations,
  sc_mut_table_with_nt,
  sim_updates,
  fa_dir,
  output_dir,
  art_path,
  n_cores = 4,
  depth = 30,
  readLen = 150,
  otherArgs = "",
  keep_genome_files = FALSE,
  batch_size = 10
)

Arguments

sampled_cell_idx

Vector of cell indices for which to simulate reads

dynamics_ob

List object returned by simulate_sc_dynamics containing cell lineage information

sc_founder_genomes

List containing genome information for clone founders with elements: all_node_genomes, child_node_founders_df

all_sampled_sc_mutations

List containing mutation information for sampled cells

sc_mut_table_with_nt

Data frame containing mutation details with nucleotide changes

sim_updates

List containing updated simulation information including all_node_segments

fa_dir

Character string specifying the directory to store FASTA files

output_dir

Character string specifying the directory for output read files

art_path

Character string specifying the path to the ART sequencing simulator executable

n_cores

Integer specifying the number of CPU cores to use for parallel processing (default: 4)

depth

Numeric value specifying the desired sequencing depth/coverage (default: 30)

readLen

Integer specifying the length of simulated reads in base pairs (default: 150)

otherArgs

Character string with additional arguments to pass to ART (default: "")

keep_genome_files

Logical indicating whether to retain the temporary FASTA genome files after simulation (default: FALSE)

batch_size

Integer specifying the number of cells to process in each batch (default: 10)

Value

A list where each element contains mutation check results for a cell, with the cell index as the name of each element

Details

This function performs the following steps for each batch of cells:

  1. For each cell in the batch:

    • Extracts cell lineage information from dynamics_ob

    • Synthesizes a cell-specific genome with mutations using synth_sc_genome()

    • Performs checks on the generated genome using check_genome_mutations()

    • Validates the sanity check results with validate_mutation_check()

    • Combines maternal and paternal haplotypes into a single FASTA file

    • Writes the merged genome to disk

  2. Simulates sequencing reads for all cells in the batch in parallel using simulate_read_dynamic_sc()

  3. Optionally removes temporary genome files to save disk space

Processing cells in batches helps manage memory usage when dealing with large numbers of cells. The function relies on external functions like synth_sc_genome(), check_genome_mutations(), validate_mutation_check(), and simulate_read_dynamic_sc().

Examples

if (FALSE) { # \dontrun{
# Assuming you have already run a simulation and have necessary objects
mutation_checks <- simulate_sc_dynamic_reads_for_batches(
  sampled_cell_idx = c(50, 51, 52, 53, 54),
  dynamics_ob = sc_dynamics_result,
  sc_founder_genomes = founder_genomes,
  all_sampled_sc_mutations = sampled_mutations,
  sc_mut_table_with_nt = mutations_with_nt,
  sim_updates = simulation_updates,
  fa_dir = "data/genomes/",
  output_dir = "results/reads/",
  art_path = "/usr/local/bin/art_illumina",
  n_cores = 4,
  depth = 30,
  readLen = 150,
  keep_genome_files = FALSE,
  batch_size = 2
)

# Check results for a specific cell
print(mutation_checks[["50"]])
} # }