R/segment_relevant_functions.R
identify_region_clusters.RdIdentifies clusters of adjacent genomic regions that show high correlation with each other. Uses a sliding window approach to find contiguous regions where the average pairwise correlation exceeds a specified threshold.
identify_region_clusters(cor_matrix, min_cluster_size, threshold)A numeric correlation matrix where rows and columns represent genomic regions, with column names identifying the regions
Integer specifying the minimum number of regions required to form a cluster (must be >= 2)
Numeric value between -1 and 1 specifying the minimum average correlation required for regions to be considered part of the same cluster
A list where each element represents a cluster. Each cluster contains:
Named integer vector with positions of regions in the cluster
Names of the vector correspond to the region identifiers from the input correlation matrix
The function implements a dynamic window-based clustering algorithm:
Starts with a window of minimum cluster size
Calculates average correlation (excluding diagonal) within the window
If average correlation exceeds threshold:
Extends window if not at matrix end
Finalizes cluster if at matrix end
If correlation drops below threshold:
Finalizes cluster if window size >= minimum
Moves to next position if window size < minimum
Other clustering functions in the package for genomic analysis
# Create sample correlation matrix
set.seed(123)
n_regions <- 10
cor_mat <- matrix(runif(n_regions^2, -1, 1), nrow = n_regions)
cor_mat[upper.tri(cor_mat)] <- t(cor_mat)[upper.tri(cor_mat)]
diag(cor_mat) <- 1
colnames(cor_mat) <- paste0("region_", 1:n_regions)
# Find clusters
clusters <- identify_region_clusters(
cor_matrix = cor_mat,
min_cluster_size = 3,
threshold = 0.7
)
#> Warning: restarting interrupted promise evaluation
#> Warning: internal error -3 in R_decompress1
#> Error: lazy-load database '/Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/library/cancerSimCraft/R/cancerSimCraft.rdb' is corrupt