Takes a phased VCF table and converts it into a list of filtered SNP data frames, separated by genotype categories (homozygous alternative and heterozygous variants).

vcf_to_snp_list(phased_vcf_table, sample_name)

Arguments

phased_vcf_table

A data frame containing phased VCF data with at least the following columns: CHROM, POS, REF, ALT, and a sample-specific genotype column

sample_name

Character string specifying the name of the sample column in the VCF table

Value

A list containing four data frames:

  • all: All SNPs (single-nucleotide variants only)

  • 1|1: Homozygous alternative variants

  • 1|0: Heterozygous variants with alternative allele on first haplotype

  • 0|1: Heterozygous variants with alternative allele on second haplotype

Details

The function filters for single-nucleotide variants only (where REF and ALT are single characters) and separates the genotype field (GT) from the dosage field (DS) in the sample column.

Examples

vcf_df <- data.frame(
  CHROM = c("chr1", "chr1"),
  POS = c(1000, 2000),
  REF = c("A", "C"),
  ALT = c("G", "T"),
  sample1 = c("1|0:0.5", "0|1:0.5")
)
snp_lists <- vcf_to_snp_list(vcf_df, "sample1")
#> Warning: restarting interrupted promise evaluation
#> Warning: internal error -3 in R_decompress1
#> Error: lazy-load database '/Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/library/cancerSimCraft/R/cancerSimCraft.rdb' is corrupt