#' This function introduces single nucleotide polymorphisms (SNPs) into a reference genome
based on a list of SNPs. The SNPs are categorized as homozygous (1|1) or heterozygous (1|0 or 0|1),
and the function ensures that SNPs at duplicated positions are handled appropriately.
The result is a synthetic genome with SNPs inserted for both maternal and paternal haplotypes.
insert_snps_to_genome(seg_names, snp_list, ref_genome)Character vector of segment/chromosome names to process
A nested list containing SNP information for each segment, with sublists for different genotypes ('1|1', '1|0', '0|1'), each containing data frames with at least POS and ALT columns
A DNAStringSet or similar object containing reference sequences, with names matching seg_names
A list with two components:
sim_genome: List containing maternal and paternal haplotype sequences
snp_info: List containing SNP information for each haplotype
The function processes each segment as follows:
Combines homozygous SNPs with appropriate heterozygous SNPs for each haplotype
Identifies and removes SNPs with duplicated positions
Inserts alternative alleles into the reference sequence
Stores both modified sequences and SNP information
Maternal haplotypes receive '1|0' heterozygous variants, while paternal haplotypes receive '0|1' variants. Both haplotypes receive all homozygous alternative ('1|1') variants.
if (FALSE) { # \dontrun{
# Example reference genome
ref_genome <- DNAStringSet(c(
chr1 = "ACTGACTGACTG",
chr2 = "GTCAGTCAGTCA"
))
# Example SNP list
snp_list <- list(
chr1 = list(
"1|1" = data.frame(POS = 1, ALT = "G"),
"1|0" = data.frame(POS = 5, ALT = "T"),
"0|1" = data.frame(POS = 8, ALT = "C")
),
chr2 = list(
"1|1" = data.frame(POS = 2, ALT = "A"),
"1|0" = data.frame(POS = 6, ALT = "G"),
"0|1" = data.frame(POS = 9, ALT = "T")
)
)
result <- insert_snps_to_genome(c("chr1", "chr2"), snp_list, ref_genome)
} # }