Sparse functional data analysis accounts for missing information in single-cell epigenomics

Single-cell epigenome assays produce sparsely sampled data, leading to coverage pooling across cells to increase resolution. Imputation of missing data using deep learning is available but requires intensive computation, and it has been applied only to DNA methylation obtained by single cell bisulfite sequencing. Here, sparsity in chromatin accessibility obtained by scNMT-seq is addressed using functional data analysis to fit sparsely sampled GpC coverage profiles of individual cells taking into account all the cells of the same cell-type or condition. For that, sparse functional principal component analysis (S-FPCA) is applied, and the principal components are used to estimate chromatin accessibility coverage in individual cells. This methodology can potentially be used with other single-cell assays with missing data such as scBS-seq, scNOME-seq, or scATAC-seq. The R package fdapace is available in CRAN, and R code used in this manuscript can be found at: http://github.com/pmb59/sparseSingleCell.

[1]  Alicia N. Schep,et al.  Unsupervised clustering and epigenetic classification of single cells , 2017, Nature Communications.

[2]  Sebastian Pott Simultaneous measurement of chromatin accessibility, DNA methylation, and nucleosome phasing in single cells , 2017, bioRxiv.

[3]  F. Tang,et al.  Single-cell multi-omics sequencing of mouse early embryos and embryonic stem cells , 2017, Cell Research.

[4]  Guido Sanguinetti,et al.  Melissa: Bayesian clustering and imputation of single-cell methylomes , 2019, Genome Biology.

[5]  J. Ramsay,et al.  Introduction to Functional Data Analysis , 2007 .

[6]  Rong Li,et al.  Single-cell multi-omics sequencing of human early embryos , 2018, Nature Cell Biology.

[7]  Jane-Ling Wang,et al.  Review of Functional Data Analysis , 2015, 1507.05135.

[8]  O. Stegle,et al.  Single-cell epigenomics: Recording the past and predicting the future , 2017, Science.

[9]  O. Stegle,et al.  DeepCpG: accurate prediction of single-cell DNA methylation states using deep learning , 2016, Genome Biology.

[10]  Felix Krueger,et al.  Bismark: a flexible aligner and methylation caller for Bisulfite-Seq applications , 2011, Bioinform..

[11]  H. Müller,et al.  Functional Data Analysis for Sparse Longitudinal Data , 2005 .

[12]  Pedro Madrigal,et al.  Uncovering correlated variability in epigenomic datasets using the Karhunen-Loeve transform , 2015, BioData Mining.

[13]  Korbinian Schneeberger,et al.  Combinatorial activities of SHORT VEGETATIVE PHASE and FLOWERING LOCUS C define distinct modes of flowering regulation in Arabidopsis , 2015, Genome Biology.

[14]  Lia Chappell,et al.  Single-Cell (Multi)omics Technologies. , 2018, Annual review of genomics and human genetics.

[15]  G. Sanguinetti,et al.  scNMT-seq enables joint profiling of chromatin accessibility DNA methylation and transcription in single cells , 2018, Nature Communications.