Epiclomal: Probabilistic clustering of sparse single-cell DNA methylation data

We present Epiclomal, a probabilistic clustering method arising from a hierarchical mixture model to simultaneously cluster sparse single-cell DNA methylation data and impute missing values. Using synthetic and published single-cell CpG datasets, we show that Epiclomal outperforms non-probabilistic methods and can handle the inherent missing data characteristic that dominates single-cell CpG genome sequences. Using newly generated single-cell 5mCpG sequencing data, we show that Epiclomal discovers sub-clonal methylation patterns in aneuploid tumour genomes, thus defining epiclones that can match or transcend copy number-determined clonal lineages and opening up an important form of clonal analysis in cancer. Epiclomal is written in R and Python and is available at https://github.com/shahcompbio/Epiclomal.

[1]  Leland McInnes,et al.  UMAP: Uniform Manifold Approximation and Projection , 2018, J. Open Source Softw..

[2]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[3]  Anne Condon,et al.  densityCut: an efficient and versatile topological approach for automatic clustering of biological data , 2016, Bioinform..

[4]  Hyunsoo Kim,et al.  Sparse Non-negative Matrix Factorizations via Alternating Non-negativity-constrained Least Squares , 2006 .

[5]  Sohrab P. Shah,et al.  Dynamics of genomic clones in breast cancer patient xenografts at single-cell resolution , 2014, Nature.

[6]  David M. Blei,et al.  Variational Inference: A Review for Statisticians , 2016, ArXiv.

[7]  C. Ponting,et al.  Parallel single-cell sequencing links transcriptional and epigenetic heterogeneity , 2015, Nature Methods.

[8]  W. Koh,et al.  Single-cell genome sequencing: current state of the science , 2016, Nature Reviews Genetics.

[9]  J. Troge,et al.  Tumour evolution inferred by single-cell sequencing , 2011, Nature.

[10]  Samuel Aparicio,et al.  Scalable whole-genome single-cell library preparation without preamplification , 2017, Nature Methods.

[11]  S. Linnarsson,et al.  Cell types in the mouse cortex and hippocampus revealed by single-cell RNA-seq , 2015, Science.

[12]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[13]  Alexandre Bouchard-Côté,et al.  Clonal genotype and population structure inference from single-cell tumor sequencing , 2016, Nature Methods.

[14]  Guido Sanguinetti,et al.  Melissa: Bayesian clustering and imputation of single-cell methylomes , 2019, Genome Biology.

[15]  W. Reik,et al.  Epigenetic Reprogramming in Plant and Animal Development , 2010, Science.

[16]  Andrew J Sharp,et al.  DNA methylation profiles of human active and inactive X chromosomes. , 2011, Genome research.

[17]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[18]  Leland McInnes,et al.  UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction , 2018, ArXiv.

[19]  Lu Wen,et al.  Single-cell triple omics sequencing reveals genetic, epigenetic, and transcriptomic heterogeneity in hepatocellular carcinomas , 2016, Cell Research.

[20]  Martin Hirst,et al.  High-Resolution Single-Cell DNA Methylation Measurements Reveal Epigenetically Distinct Hematopoietic Stem Cell Subpopulations , 2018, Stem cell reports.

[21]  Andrew C. Adey,et al.  Highly scalable generation of DNA methylation profiles in single cells , 2018, Nature Biotechnology.

[22]  Decheng Ren,et al.  RADAR: differential analysis of MeRIP-seq data with a random effect model , 2019, Genome Biology.

[23]  E. Shapiro,et al.  Single-cell sequencing-based technologies will revolutionize whole-organism science , 2013, Nature Reviews Genetics.

[24]  Julia Hirschberg,et al.  V-Measure: A Conditional Entropy-Based External Cluster Evaluation Measure , 2007, EMNLP.

[25]  T. Caliński,et al.  A dendrite method for cluster analysis , 1974 .

[26]  O. Stegle,et al.  DeepCpG: accurate prediction of single-cell DNA methylation states using deep learning , 2016, Genome Biology.

[27]  Thomas Lengauer,et al.  DNA Methylation Dynamics of Human Hematopoietic Stem Cell Differentiation , 2016, Cell stem cell.

[28]  W. Reik,et al.  Single-cell epigenomics: powerful new methods for understanding gene regulation and cell identity , 2016, Genome Biology.

[29]  Bradley P. Carlin,et al.  Bayesian measures of model complexity and fit , 2002 .

[30]  O. Stegle,et al.  Single-Cell Genome-Wide Bisulfite Sequencing for Assessing Epigenetic Heterogeneity , 2014, Nature Methods.

[31]  Justin P Sandoval,et al.  Single-cell methylomes identify neuronal subtypes and regulatory elements in mammalian cortex , 2017, Science.

[32]  P. Nowell The clonal evolution of tumor cell populations. , 1976, Science.

[33]  Mark T. W. Ebbert,et al.  Tumor grafts derived from women with breast cancer authentically reflect tumor pathology, growth, metastasis and disease outcomes , 2011, Nature Medicine.

[34]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[35]  Zachary D. Smith,et al.  DNA methylation: roles in mammalian development , 2013, Nature Reviews Genetics.

[36]  Philipp Koehn,et al.  Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL) , 2007 .

[37]  Silvia Gravina,et al.  Single-cell genome-wide bisulfite sequencing uncovers extensive heterogeneity in the mouse liver methylome , 2016, Genome Biology.