R/pattern_extraction.R
extend_blueprint_cn_for_equal_bin.RdThis function extends a copy number (CN) data frame (seg_dat) to ensure that each genomic window
is divided into equal-sized bins of a specified size (bin_unit). It replicates columns in the
input data frame based on the size of each window and the desired bin size, creating a new data
frame with expanded columns.
extend_blueprint_cn_for_equal_bin(seg_dat, window_sizes, bin_unit)A data frame of copy number segments where:
Rows represent clones/samples
Columns represent genomic windows
Values represent copy numbers
A numeric vector containing the size of each window in base pairs, must match the number of columns in seg_dat
The desired size of each bin in base pairs
A data frame with:
Same number of rows as input
Expanded columns based on window sizes and bin unit
Column names as "original_window_name_binIndex"
Original row names preserved
The function:
Calculates how many bins each window needs based on its size
Replicates copy number values to fill the required bins
Maintains data continuity while creating equal-sized segments
Window sizes that aren't exact multiples of bin_unit will be rounded up to ensure complete coverage
if (FALSE) { # \dontrun{
seg_data <- data.frame(
chr1_1000_3000 = c(2, 1),
chr1_3000_4000 = c(1, 1),
row.names = c("clone1", "clone2")
)
window_sizes <- c(2000, 1000)
bin_unit <- 1000
expanded <- extend_blueprint_cn_for_equal_bin(seg_data, window_sizes, bin_unit)
# Returns expanded data frame with bins of 1000bp each
} # }