A cosine similarity-based method to infer variability of chromatin accessibility at the single-cell level

Cellular identity between generations of developing cells is propagated through the epigenome particularly via the accessible parts of the chromatin. It is now possible to measure chromatin accessibility at single-cell resolution using single-cell assay for transposase accessible chromatin (scATAC-seq), which can reveal the regulatory variation behind the phenotypic variation. However, single-cell chromatin accessibility data are sparse, binary, and high dimensional, leading to unique computational challenges. To overcome these difficulties, we developed PRISM a computational workflow and R package (https://github.com/stanleycai123/PRISM) that quantifies cell-to-cell chromatin accessibility variation while controlling for technical biases. Using data generated in our lab or publically available, we show that PRISM outperforms an existing algorithm, which relies on the aggregate of signal across a set of genomic regions. PRISM shows robustness to noise in low accessibility cells and reveals previously masked accessibility variation where accessible sites differ between cells but total number of accessible sites is constant. We also show that PRISM, but not an existing algorithm, finds suppressed heterogeneity of accessibility at CTCF binding sites. PRISM is a novel multidimensional scaling-based method using angular cosine distance metrics coupled with distance from the spatial centroid. PRISM takes differences in accessibility at each genomic region between single cells into account. This updated approach uncovers new biological results with profound implications on the cellular heterogeneity of chromatin architecture.