Assessment of computational methods for the analysis of single-cell ATAC-seq data

Background Recent innovations in single-cell Assay for Transposase Accessible Chromatin using sequencing (scATAC-seq) enable profiling of the epigenetic landscape of thousands of individual cells. scATAC-seq data analysis presents unique methodological challenges. scATAC-seq experiments sample DNA, which, due to low copy numbers (diploid in humans) lead to inherent data sparsity (1-10% of peaks detected per cell) compared to transcriptomic (scRNA-seq) data (20-50% of expressed genes detected per cell). Such challenges in data generation emphasize the need for informative features to assess cell heterogeneity at the chromatin level. Results We present a benchmarking framework that was applied to 10 computational methods for scATAC-seq on 13 synthetic and real datasets from different assays, profiling cell types from diverse tissues and organisms. Methods for processing and featurizing scATAC-seq data were evaluated by their ability to discriminate cell types when combined with common unsupervised clustering approaches. We rank evaluated methods and discuss computational challenges associated with scATAC-seq analysis including inherently sparse data, determination of features, peak calling, the effects of sequencing coverage and noise, and clustering performance. Running times and memory requirements are also discussed. Conclusions This reference summary of scATAC-seq methods offers recommendations for best practices with consideration for both the non-expert user and the methods developer. Despite variation across methods and datasets, SnapATAC, Cusanovich2018, and cisTopic outperform other methods in separating cell populations of different coverages and noise levels in both synthetic and real datasets. Notably, SnapATAC was the only method able to analyze a large dataset (> 80,000 cells).

[1]  Jason D. Buenrostro,et al.  The cis-Regulatory Atlas of the Mouse Immune System , 2019, Cell.

[2]  F. McCoy,et al.  Janus-faced PIDD: a sensor for DNA damage-induced cell death or survival? , 2012, Molecular cell.

[3]  Daphne M. Tsoucas,et al.  GiniClust2: a cluster-aware, weighted ensemble clustering method for cell-type detection , 2018, Genome Biology.

[4]  R. Stark,et al.  DiffBind : Differential binding analysis of ChIP-Seq peak data , 2012 .

[5]  Martin J. Aryee,et al.  Droplet-based combinatorial indexing for massive scale single-cell epigenomics , 2019, bioRxiv.

[6]  Zhicheng Ji,et al.  Single-cell regulome data analysis by SCRAT , 2017, Bioinform..

[7]  Aviv Regev,et al.  BROCKMAN: deciphering variance in epigenomic regulators by k-mer factorization , 2018, BMC Bioinformatics.

[8]  Guocheng Yuan,et al.  GiniClust: detecting rare cell types from single-cell gene expression data with Gini index , 2016, Genome Biology.

[9]  Anshul Kundaje,et al.  The ENCODE Blacklist: Identification of Problematic Regions of the Genome , 2019, Scientific Reports.

[10]  Aviv Regev,et al.  Transcriptional States and Chromatin Accessibility Underlying Human Erythropoiesis , 2019, Cell reports.

[11]  Hannah A. Pliner,et al.  Reversed graph embedding resolves complex single-cell trajectories , 2017, Nature Methods.

[12]  Andrew C. Adey,et al.  Cicero Predicts cis-Regulatory DNA Interactions from Single-Cell Chromatin Accessibility Data. , 2018, Molecular cell.

[13]  B. Ren,et al.  Fast and Accurate Clustering of Single Cell Epigenomes Reveals Cis-Regulatory Elements in Rare Cell Types , 2019 .

[14]  Cole Trapnell,et al.  Supervised classification enables rapid annotation of cell atlases , 2019, Nature Methods.

[15]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[16]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[17]  Andrew C. Adey,et al.  Multiplex single-cell profiling of chromatin accessibility by combinatorial cellular indexing , 2015, Science.

[18]  Howard Y. Chang,et al.  Lineage-specific and single cell chromatin accessibility charts human hematopoiesis and leukemia evolution , 2016, Nature Genetics.

[19]  Kai Zhang,et al.  SnapATAC: A Comprehensive Analysis Package for Single Cell ATAC-seq , 2019, bioRxiv.

[20]  Leland McInnes,et al.  UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction , 2018, ArXiv.

[21]  William J. Greenleaf,et al.  chromVAR: Inferring transcription factor-associated accessibility from single-cell epigenomic data , 2017, Nature Methods.

[22]  Hannah A. Pliner,et al.  The cis-regulatory dynamics of embryonic development at single cell resolution , 2017, Nature.

[23]  Stein Aerts,et al.  cisTopic: cis-regulatory topic modeling on single-cell ATAC-seq data , 2019, Nature Methods.

[24]  Sean Hughes,et al.  Clustering by Fast Search and Find of Density Peaks , 2016 .

[25]  M. Hemberg,et al.  Publisher Correction: Challenges in unsupervised clustering of single-cell RNA-seq data , 2019, Nature Reviews Genetics.

[26]  Data production leads,et al.  An integrated encyclopedia of DNA elements in the human genome , 2012 .

[27]  M. Hemberg,et al.  Challenges in unsupervised clustering of single-cell RNA-seq data , 2019, Nature Reviews Genetics.

[28]  David J. Arenillas,et al.  JASPAR 2016: a major expansion and update of the open-access database of transcription factor binding profiles , 2015, Nucleic Acids Res..

[29]  Alicia N. Schep,et al.  Unsupervised clustering and epigenetic classification of single cells , 2017, Nature Communications.

[30]  Lothar Reichel,et al.  Augmented Implicitly Restarted Lanczos Bidiagonalization Methods , 2005, SIAM J. Sci. Comput..

[31]  Fabian J Theis,et al.  SCANPY: large-scale single-cell gene expression data analysis , 2018, Genome Biology.

[32]  Rory Stark,et al.  DiBind : Dierential binding analysis of ChIP-Seq peak data , 2016 .

[33]  Aaron R. Quinlan,et al.  Bioinformatics Applications Note Genome Analysis Bedtools: a Flexible Suite of Utilities for Comparing Genomic Features , 2022 .

[34]  Robert Gentleman,et al.  Software for Computing and Annotating Genomic Ranges , 2013, PLoS Comput. Biol..

[35]  William S. DeWitt,et al.  A Single-Cell Atlas of In Vivo Mammalian Chromatin Accessibility , 2018, Cell.

[36]  Martin J. Aryee,et al.  Integrated Single-Cell Analysis Maps the Continuous Regulatory Landscape of Human Hematopoietic Differentiation , 2018, Cell.

[37]  T. Mikkelsen,et al.  The NIH Roadmap Epigenomics Mapping Consortium , 2010, Nature Biotechnology.

[38]  Howard Y. Chang,et al.  Massively parallel single-cell chromatin landscapes of human immune cell development and intratumoral T cell exhaustion , 2019, Nature Biotechnology.

[39]  Shuigeng Zhou,et al.  Single-cell trajectories reconstruction, exploration and mapping of omics data with STREAM , 2019, Nature Communications.

[40]  ENCODEConsortium,et al.  An Integrated Encyclopedia of DNA Elements in the Human Genome , 2012, Nature.

[41]  Martin J. Aryee,et al.  Interrogation of human hematopoiesis at single-cell and single-variant resolution , 2018, Nature Genetics.

[42]  Sean C. Bendall,et al.  Data-Driven Phenotypic Dissection of AML Reveals Progenitor-like Cells that Correlate with Prognosis , 2015, Cell.