Learning Deep Features for DNA Methylation Data Analysis

Many studies demonstrated that the DNA methylation, which occurs in the context of a CpG, has strong correlation with diseases, including cancer. There is a strong interest in analyzing the DNA methylation data to find how to distinguish different subtypes of the tumor. However, the conventional statistical methods are not suitable for analyzing the highly dimensional DNA methylation data with bounded support. In order to explicitly capture the properties of the data, we design a deep neural network, which composes of several stacked binary restricted Boltzmann machines, to learn the low-dimensional deep features of the DNA methylation data. Experimental results show that these features perform best in breast cancer DNA methylation data cluster analysis, compared with some state-of-the-art methods.

[1]  Olivier Thonnard,et al.  An Experimental Study of Diversity with Off-the-Shelf AntiVirus Engines , 2009, 2009 Eighth IEEE International Symposium on Network Computing and Applications.

[2]  Arturas Petronis,et al.  Epigenetics as a unifying principle in the aetiology of complex traits and diseases , 2010, Nature.

[3]  Jürgen Schmidhuber,et al.  Multi-column deep neural networks for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Masakazu Matsugu,et al.  Subject independent facial expression recognition with robust face detection using a convolutional neural network , 2003, Neural Networks.

[5]  Teuvo Kohonen,et al.  Self-organized formation of topologically correct feature maps , 2004, Biological Cybernetics.

[6]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.

[7]  Tyson A. Clark,et al.  Direct detection of DNA methylation during single-molecule, real-time sequencing , 2010, Nature Methods.

[8]  Zhanyu Ma,et al.  A variational Bayes beta Mixture Model for Feature Selection in DNA methylation Studies , 2013, J. Bioinform. Comput. Biol..

[9]  Honggang Zhang,et al.  Variational Bayesian Matrix Factorization for Bounded Support Data , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[11]  Brendan J. Frey,et al.  Deep learning of the tissue-regulated splicing code , 2014, Bioinform..

[12]  Peter W. Laird,et al.  A comparison of cluster analysis methods using DNA methylation data , 2004, Bioinform..

[13]  M. Esteller,et al.  Validation of a DNA methylation microarray for 450,000 CpG sites in the human genome , 2011, Epigenetics.

[14]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[15]  Carsten Wiuf,et al.  A Beta-mixture model for dimensionality reduction, sample classification and analysis , 2011, BMC Bioinformatics.

[16]  Arne Leijon,et al.  Bayesian Estimation of Beta Mixture Models with Variational Inference , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[18]  Xiao Zhang,et al.  Comparison of Beta-value and M-value methods for quantifying methylation levels by microarray analysis , 2010, BMC Bioinformatics.

[19]  Pasin Israsena,et al.  EEG-Based Emotion Recognition Using Deep Learning Network with Principal Component Based Covariate Shift Adaptation , 2014, TheScientificWorldJournal.

[20]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[21]  Weichung Joe Shih,et al.  A mixture model for estimating the local false discovery rate in DNA microarray analysis , 2004, Bioinform..

[22]  Izhar Wallach,et al.  The protein-small-molecule database, a non-redundant structural resource for the analysis of protein-ligand binding , 2009, Bioinform..

[23]  Andrew R. Jamieson,et al.  Exploring nonlinear feature space dimension reduction and data representation in breast Cadx with Laplacian eigenmaps and t-SNE. , 2009, Medical physics.

[24]  Guoli Wang,et al.  LS-NMF: A modified non-negative matrix factorization algorithm utilizing uncertainty estimates , 2006, BMC Bioinformatics.

[25]  A. Bird,et al.  CpG islands and the regulation of transcription. , 2011, Genes & development.

[26]  Douglas Eck,et al.  Learning Features from Music Audio with Deep Belief Networks , 2010, ISMIR.

[27]  Honglak Lee,et al.  Learning and Selecting Features Jointly with Point-wise Gated Boltzmann Machines , 2013, ICML.

[28]  Jalil Taghia,et al.  Variational Inference for Watson Mixture Model , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Jun S. Liu,et al.  The Collapsed Gibbs Sampler in Bayesian Computations with Applications to a Gene Regulation Problem , 1994 .

[30]  Jalil Taghia,et al.  Comparisons of Non-Gaussian Statistical Models in DNA Methylation Analysis , 2014, International journal of molecular sciences.

[31]  Richard Walker,et al.  PD Disease State Assessment in Naturalistic Environments Using Deep Learning , 2015, AAAI.

[32]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[33]  H. Kitchener,et al.  The Dynamics and Prognostic Potential of DNA Methylation Changes at Stem Cell Gene Loci in Women's Cancer , 2012, PLoS genetics.