Identifies clusters of adjacent genomic regions that show high correlation with each other. Uses a sliding window approach to find contiguous regions where the average pairwise correlation exceeds a specified threshold.

identify_region_clusters(cor_matrix, min_cluster_size, threshold)

Arguments

cor_matrix

A numeric correlation matrix where rows and columns represent genomic regions, with column names identifying the regions

min_cluster_size

Integer specifying the minimum number of regions required to form a cluster (must be >= 2)

threshold

Numeric value between -1 and 1 specifying the minimum average correlation required for regions to be considered part of the same cluster

Value

A list where each element represents a cluster. Each cluster contains:

  • Named integer vector with positions of regions in the cluster

  • Names of the vector correspond to the region identifiers from the input correlation matrix

Details

The function implements a dynamic window-based clustering algorithm:

  • Starts with a window of minimum cluster size

  • Calculates average correlation (excluding diagonal) within the window

  • If average correlation exceeds threshold:

    • Extends window if not at matrix end

    • Finalizes cluster if at matrix end

  • If correlation drops below threshold:

    • Finalizes cluster if window size >= minimum

    • Moves to next position if window size < minimum

See also

Other clustering functions in the package for genomic analysis

Examples

# Create sample correlation matrix
set.seed(123)
n_regions <- 10
cor_mat <- matrix(runif(n_regions^2, -1, 1), nrow = n_regions)
cor_mat[upper.tri(cor_mat)] <- t(cor_mat)[upper.tri(cor_mat)]
diag(cor_mat) <- 1
colnames(cor_mat) <- paste0("region_", 1:n_regions)

# Find clusters
clusters <- identify_region_clusters(
  cor_matrix = cor_mat,
  min_cluster_size = 3,
  threshold = 0.7
)
#> Warning: restarting interrupted promise evaluation
#> Warning: internal error -3 in R_decompress1
#> Error: lazy-load database '/Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/library/cancerSimCraft/R/cancerSimCraft.rdb' is corrupt