MethylNet: an automated and modular deep learning approach for DNA methylation analysis

Background DNA methylation (DNAm) is an epigenetic regulator of gene expression programs that can be altered by environmental exposures, aging, and in pathogenesis. Traditional analyses that associate DNAm alterations with phenotypes suffer from multiple hypothesis testing and multi-collinearity due to the high-dimensional, continuous, interacting and non-linear nature of the data. Deep learning analyses have shown much promise to study disease heterogeneity. DNAm deep learning approaches have not yet been formalized into user-friendly frameworks for execution, training, and interpreting models. Here, we describe MethylNet, a DNAm deep learning method that can construct embeddings, make predictions, generate new data, and uncover unknown heterogeneity with minimal user supervision. Results The results of our experiments indicate that MethylNet can study cellular differences, grasp higher order information of cancer sub-types, estimate age and capture factors associated with smoking in concordance with known differences. Conclusion The ability of MethylNet to capture nonlinear interactions presents an opportunity for further study of unknown disease, cellular heterogeneity and aging processes.

[1]  I. Férnandez-Cadenas,et al.  Biological Age is a predictor of mortality in Ischemic Stroke , 2018, Scientific Reports.

[2]  Margaret R. Karagas,et al.  Model-based clustering of DNA methylation array data: a recursive-partitioning algorithm for high-dimensional data arising as a mixture of beta distributions , 2008, BMC Bioinformatics.

[3]  Manolis Kellis,et al.  Chromatin-state discovery and genome annotation with ChromHMM , 2017, Nature Protocols.

[4]  Luigi Ferrucci,et al.  A new aging measure captures morbidity and mortality risk across diverse subpopulations from NHANES IV: A cohort study , 2018, PLoS medicine.

[5]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[6]  Markus Krane,et al.  DNA methylation signatures follow preformed chromatin compartments in cardiac myocytes , 2017, Nature Communications.

[7]  Yang Wang,et al.  A deep neural network based regression model for triglyceride concentrations prediction using epigenome-wide DNA methylation profiles , 2018, BMC Proceedings.

[8]  Yadong Wang,et al.  Exploring DNA Methylation Data of Lung Cancer Samples with Variational Autoencoders , 2018, 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[9]  Ulf Gyllensten,et al.  Continuous Aging of the Human DNA Methylome Throughout the Human Lifespan , 2013, PloS one.

[10]  S. Horvath,et al.  DNA methylation aging clocks: challenges and recommendations , 2019, Genome Biology.

[11]  Jack A. Taylor,et al.  Methylation-based biological age and breast cancer risk. , 2019, Journal of the National Cancer Institute.

[12]  Joshua J. Levy,et al.  PyMethylProcess - convenient high-throughput preprocessing workflow for DNA methylation data , 2019, Bioinform..

[13]  Kirthevasan Kandasamy,et al.  Neural Architecture Search with Bayesian Optimisation and Optimal Transport , 2018, NeurIPS.

[14]  Paolo Vineis,et al.  Epigenetic Signatures of Cigarette Smoking , 2016, Circulation. Cardiovascular genetics.

[15]  S. Li,et al.  DNA Methylation Markers for Pan-Cancer Prediction by Deep Learning , 2019, Genes.

[16]  Bradley J. Erickson,et al.  Residual Deep Convolutional Neural Network Predicts MGMT Methylation Status , 2017, Journal of Digital Imaging.

[17]  Feng Luo,et al.  DeepSignal: detecting DNA methylation state from Nanopore sequencing reads using deep-learning , 2018, bioRxiv.

[18]  T. Ideker,et al.  Genome-wide methylation profiles reveal quantitative views of human aging rates. , 2013, Molecular cell.

[19]  B. Christensen,et al.  Tracing human stem cell lineage during development using DNA methylation , 2018, Genome research.

[20]  M. Esteller,et al.  Validation of a DNA methylation microarray for 850,000 CpG sites of the human genome enriched in enhancer sequences , 2015, Epigenomics.

[21]  Carly A. Bobak,et al.  Unsupervised deep learning with variational autoencoders applied to breast tumor genome-wide DNA methylation data with biologic feature extraction , 2018, bioRxiv.

[22]  Zhanyu Ma,et al.  Deep Neural Network for Analysis of DNA Methylation Data , 2018, 1808.01359.

[23]  Rondi A. Butler,et al.  An optimized library for reference-based deconvolution of whole-blood biospecimens assayed using the Illumina HumanMethylationEPIC BeadArray , 2018, Genome Biology.

[24]  D. Gifford,et al.  Predicting the impact of non-coding variants on DNA methylation , 2016 .

[25]  Scott Lundberg,et al.  A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[26]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[27]  Hong Zheng,et al.  A deep learning framework for imputing missing values in genomic data , 2018, bioRxiv.

[28]  Anne E Carpenter,et al.  Opportunities and obstacles for deep learning in biology and medicine , 2017, bioRxiv.

[29]  Aaron Golden,et al.  Gene-set analysis is severely biased when applied to genome-wide methylation data , 2013, Bioinform..

[30]  Jack A. Taylor,et al.  Blood DNA methylation and breast cancer: A prospective case-cohort analysis in the Sister Study. , 2019, Journal of the National Cancer Institute.

[31]  Nathan C. Sheffield,et al.  LOLA: enrichment analysis for genomic region sets and regulatory elements in R and Bioconductor , 2015, Bioinform..

[32]  Dong Xu,et al.  Predicting DNA Methylation State of CpG Dinucleotide Using Genome Topological Features and Deep Networks , 2016, Scientific Reports.

[33]  Andreas Joseph,et al.  Parametric inference with universal function approximators , 2019, SSRN Electronic Journal.

[34]  Casey S. Greene,et al.  Extracting a Biologically Relevant Latent Space from Cancer Transcriptomes with Variational Autoencoders , 2017, bioRxiv.

[35]  Ji Wan,et al.  Clustering single-cell RNA-seq data with a model-based deep learning approach , 2019, Nature Machine Intelligence.

[36]  Michael R. Crusoe,et al.  Common Workflow Language , 2015 .

[37]  Janet M. Thornton,et al.  Screening for genes that accelerate the epigenetic aging clock in humans reveals a role for the H3K36 methyltransferase NSD1 , 2019, Genome Biology.

[38]  Leland McInnes,et al.  UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction , 2018, ArXiv.

[39]  M. Ringnér,et al.  An integrated genomics analysis of epigenetic subtypes in human breast tumors links DNA methylation patterns to chromatin states in normal mammary cells , 2016, Breast Cancer Research.

[40]  John Chilton,et al.  Common Workflow Language, v1.0 , 2016 .

[41]  Carl Boettiger,et al.  An introduction to Docker for reproducible research , 2014, OPSR.

[42]  Martin J. Aryee,et al.  Epigenome-wide association data implicate DNA methylation as an intermediary of genetic risk in Rheumatoid Arthritis , 2013, Nature Biotechnology.

[43]  Jovana Maksimovic,et al.  missMethyl: an R package for analyzing data from Illumina's HumanMethylation450 platform , 2016, Bioinform..

[44]  Michael I. Jordan,et al.  Deep Generative Modeling for Single-cell Transcriptomics , 2018, Nature Methods.

[45]  O. Stegle,et al.  DeepCpG: accurate prediction of single-cell DNA methylation states using deep learning , 2016, Genome Biology.

[46]  Andrew E. Teschendorff,et al.  A comparison of reference-based algorithms for correcting cell-type heterogeneity in Epigenome-Wide Association Studies , 2017, BMC Bioinformatics.

[47]  Devin C. Koestler,et al.  DNA methylation arrays as surrogate measures of cell mixture distribution , 2012, BMC Bioinformatics.

[48]  E. Andres Houseman,et al.  Reference-free deconvolution of DNA methylation data and mediation by cell composition effects , 2016, BMC Bioinformatics.

[49]  B. Christensen,et al.  Aging and Environmental Exposures Alter Tissue-Specific DNA Methylation Dependent upon CpG Island Context , 2009, PLoS genetics.

[50]  Olgica Milenkovic,et al.  E2M: A Deep Learning Framework for Associating Combinatorial Methylation Patterns with Gene Expression , 2019, bioRxiv.

[51]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[52]  S. Horvath DNA methylation age of human tissues and cell types , 2013, Genome Biology.

[53]  B. Christensen,et al.  Cell-type deconvolution from DNA methylation: a review of recent applications , 2017, Human molecular genetics.

[54]  Christopher Burgess,et al.  beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.

[55]  Lai Guan Ng,et al.  Dimensionality reduction for visualizing single-cell data using UMAP , 2018, Nature Biotechnology.

[56]  Brock C. Christensen,et al.  A New Dimension of Breast Cancer Epigenetics - Applications of Variational Autoencoders with DNA Methylation , 2018, BIOINFORMATICS.

[57]  M. Pellegrini,et al.  Human Epigenetic Aging is Logarithmic with Time across the Entire LifeSpan , 2018, bioRxiv.

[58]  A. Frigessi,et al.  DNA methylation at enhancers identifies distinct breast cancer lineages , 2017, Nature Communications.