Simulates the specific nucleotide change for a single mutation based on genomic context, handling both regular and recurrent mutations, as well as mutations in lost segments.

simulate_single_nt_change(
  single_mutation,
  seg_list,
  genome_sequence,
  nt_transition_matrix,
  recurrent = FALSE,
  original_nt = NA
)

Arguments

single_mutation

Data frame row containing information about a single mutation

seg_list

List structure containing segment information for genome regions

genome_sequence

List of DNA sequences representing the reference genome

nt_transition_matrix

Matrix specifying nucleotide transition probabilities

recurrent

Logical indicating whether this is a recurrent mutation (default: FALSE)

original_nt

Character specifying the original nucleotide for recurrent mutations (required when recurrent = TRUE, default: NA)

Value

The updated single_mutation row with additional fields:

  • seg_id: Segment identifier for the mutation

  • ref_pos: Reference position within the segment

  • original_nt: Original nucleotide at the mutation site

  • alternative_nt: Mutated nucleotide

  • processed: Set to TRUE indicating the mutation has been processed

Details

This function simulates nucleotide-level details for a single mutation by:

  1. Identifying the genomic segment containing the mutation

  2. Determining if the segment has been lost through deletion or other structural variants

  3. For mutations in lost segments:

    • Setting both original and alternative nucleotides to "N"

  4. For mutations in non-lost segments:

    • For regular mutations: Looking up the original nucleotide from the reference genome and sampling an alternative nucleotide based on the transition matrix

    • For recurrent mutations: Using the provided original nucleotide and sampling an alternative nucleotide based on the transition matrix

The function requires the helper function get_segment_info() to map the mutation coordinates to the appropriate genomic segment.