Predicting DNA Methylation States with Hybrid Information Based Deep-Learning Model

DNA methylation plays an important role in the regulation of some biological processes. Up to now, with the development of machine learning models, there are several sequence-based deep learning models designed to predict DNA methylation states, which gain better performance than traditional methods like random forest and SVM. However, convolutional network based deep learning models that use one-hot encoding DNA sequence as input may discover limited information and cause unsatisfactory prediction performance, so more data and model structures of diverse angles should be considered. In this work, we proposed a hybrid sequence-based deep learning model with both MeDIP-seq data and Histone information to predict DNA methylated CpG states (MHCpG). We combined both MeDIP-seq data and histone modification data with sequence information and implemented convolutional network to discover sequence patterns. In addition, we used statistical data gained from previous three input data and adopted a 3-layer feedforward neuron network to extract more high-level features. We compared our method with traditional predicting methods using random forest and other previous methods like CpGenie and DeepCpG, the result showed that MHCpG exceeded the other approaches and gained more satisfactory performance.

[1]  P. Laird Principles and challenges of genome-wide DNA methylation analysis , 2010, Nature Reviews Genetics.

[2]  R. Durbin,et al.  A Bayesian deconvolution strategy for immunoprecipitation-based DNA methylome analysis , 2008, Nature Biotechnology.

[3]  Ning Chen,et al.  Chromatin accessibility prediction via convolutional long short-term memory networks with k-mer embedding , 2017, Bioinform..

[4]  Charles Y. Lin,et al.  Epigenomic analysis detects aberrant super-enhancer DNA methylation in human cancer , 2016, Genome Biology.

[5]  B. Zhang,et al.  Combining MeDIP-seq and MRE-seq to investigate genome-wide CpG methylation. , 2015, Methods.

[6]  O. Stegle,et al.  DeepCpG: accurate prediction of single-cell DNA methylation states using deep learning , 2016, Genome Biology.

[7]  Dario Strbenac,et al.  Evaluation of affinity-based genome-wide DNA methylation data: effects of CpG density, amplification bias, and copy number variation. , 2010, Genome research.

[8]  Michael A. Beer,et al.  Discriminative prediction of mammalian enhancers from DNA sequence. , 2011, Genome research.

[9]  A. Bird,et al.  Methylation analysis on individual chromosomes: improved protocol for bisulphite genomic sequencing. , 1994, Nucleic acids research.

[10]  Yanjun Qi,et al.  DeepChrome: deep-learning for predicting gene expression from histone modifications , 2016, Bioinform..

[11]  B. Frey,et al.  Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning , 2015, Nature Biotechnology.

[12]  Xiaogang Wang,et al.  Deep Convolutional Network Cascade for Facial Point Detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Zhihai Ma,et al.  Widespread contribution of transposable elements to the innovation of gene regulatory networks , 2014, Genome research.

[14]  R. Young,et al.  Histone H3K27ac separates active from poised enhancers and predicts developmental state , 2010, Proceedings of the National Academy of Sciences.

[15]  A. Meissner Epigenetic modifications in pluripotent and differentiated cells , 2010, Nature Biotechnology.

[16]  A. Bird DNA methylation patterns and epigenetic memory. , 2002, Genes & development.

[17]  J. Marioni,et al.  Genome-wide Bisulfite Sequencing in Zygotes Identifies Demethylation Targets and Maps the Contribution of TET3 Oxidation , 2014, Cell reports.

[18]  D. Gifford,et al.  Predicting the impact of non-coding variants on DNA methylation , 2016 .

[19]  D. Barlow Genomic imprinting: a mammalian epigenetic discovery model. , 2011, Annual review of genetics.

[20]  T. Spector,et al.  Predicting genome-wide DNA methylation using methylation marks, genomic position, and DNA regulatory elements , 2013, Genome Biology.

[21]  Wei Wang,et al.  Predicting the Human Epigenome from DNA Motifs , 2014, Nature Methods.

[22]  Hong-Bin Shen,et al.  RNA-protein binding motifs mining with a new hybrid deep learning based cross-domain knowledge integration approach , 2016 .

[23]  Peter A. Jones Functions of DNA methylation: islands, start sites, gene bodies and beyond , 2012, Nature Reviews Genetics.

[24]  C. Bock Analysing and interpreting DNA methylation data , 2012, Nature Reviews Genetics.

[25]  Michael Q. Zhang,et al.  Comparison of sequencing-based methods to profile DNA methylation and identification of monoallelic epigenetic modifications , 2010, Nature Biotechnology.

[26]  Xiang Zhou,et al.  Heritability Estimation and Differential Analysis with Generalized Linear Mixed Models in Genomic Sequencing Studies , 2018, bioRxiv.

[27]  Xiaohui S. Xie,et al.  DanQ: a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences , 2015, bioRxiv.

[28]  Qiao Liu,et al.  Chromatin accessibility prediction via a hybrid deep convolutional neural network , 2017, Bioinform..

[29]  Michael Q. Zhang,et al.  Integrative analysis of 111 reference human epigenomes , 2015, Nature.

[30]  Yan Zheng,et al.  Higher-order partial least squares for predicting gene expression levels from chromatin states , 2018, BMC Bioinformatics.

[31]  Ivan Ovcharenko,et al.  Sequence signatures extracted from proximal promoters can be used to predict distal enhancers , 2013, Genome Biology.

[32]  Hong-Bin Shen,et al.  RNA-protein binding motifs mining with a new hybrid deep learning based cross-domain knowledge integration approach , 2016, BMC Bioinformatics.

[33]  O. Troyanskaya,et al.  Predicting effects of noncoding variants with deep learning–based sequence model , 2015, Nature Methods.

[34]  Esteban Ballestar,et al.  Methyl-DNA immunoprecipitation (MeDIP): hunting down the DNA methylome. , 2008, BioTechniques.

[35]  Jun Cheng,et al.  Modeling positional effects of regulatory sequences with spline transformations increases prediction accuracy of deep neural networks , 2017, bioRxiv.

[36]  Xiang Zhou,et al.  Differential expression analysis for RNAseq using Poisson mixed models , 2016, bioRxiv.

[37]  K. Robertson DNA methylation and human disease , 2005, Nature Reviews Genetics.

[38]  H. Cedar,et al.  Linking DNA methylation and histone modification: patterns and paradigms , 2009, Nature Reviews Genetics.

[39]  Jun Wang,et al.  Whole genome DNA methylation analysis based on high throughput sequencing technology. , 2010, Methods.

[40]  Ryan A. Flynn,et al.  A unique chromatin signature uncovers early developmental enhancers in humans , 2011, Nature.