Reveal the Cognitive Process of Deep Learning during Identifying Nucleosome Occupancy and Histone Modification

Nucleosome occupancy and histone modifications are among the most significant factors that influence gene expression. Although Chromatin Immunoprecipitation-chip (ChIP-chip) and Chromatin Immunoprecipitation-sequencing (ChIP-seq) have significantly accelerated revealing the nucleosome regulation mechanism in the course of organism life, these experimental techniques require to take plenty of material resources and may produce noise. To this end, Computer-based information processing algorithms, such as Support Vector Machines (SVMs), have given different insights into the above deficiencies. Recent researches have demonstrated that deep learning can not only conquer the above unfavorable factors in many kinds of gene regulation tasks, but also exceed the prediction performance of SVM. However, there is no explicit understanding of why the so-called “Black box” performs so well. Here, we constructed deep learning model to identify the nucleosome occupancy and histone modifications, and evaluated the performance. Then from different perspectives (alignment-based and optimization-based approaches respectively), we reported the inside mechanism of deep learning model for revealing the distribution of nucleosome occupancy and histone modification states in Yeast. Eventually, our approaches not only achieved ascertain accuracy improvement on the published datasets compared with traditional machine learning, but also we found the sequence preference of nucleosome occupancy that is consistent with previous findings.

[1]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[2]  B. Frey,et al.  Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning , 2015, Nature Biotechnology.

[3]  K. Seifart,et al.  A nucleosome positioned in the distal promoter region activates transcription of the human U6 gene , 1997, Molecular and cellular biology.

[4]  Yaniv Lubling,et al.  Distinct Modes of Regulation by Chromatin Encoded through Nucleosome Positioning Signals , 2008, PLoS Comput. Biol..

[5]  Megan F. Cole,et al.  Genome-wide Map of Nucleosome Acetylation and Methylation in Yeast , 2005, Cell.

[6]  Irene K. Moore,et al.  A genomic code for nucleosome positioning , 2006, Nature.

[7]  Andrew Zisserman,et al.  Reading Text in the Wild with Convolutional Neural Networks , 2014, International Journal of Computer Vision.

[8]  Young-Joon Kim,et al.  Intrinsic variability of gene expression encoded in nucleosome positioning sequences , 2009, Nature Genetics.

[9]  David R. Kelley,et al.  Basset: Learning the regulatory code of the accessible genome with deep convolutional neural networks , 2015 .

[10]  Kenji Satou,et al.  Application of a Feature Selection Method to Nucleosome Data: Accuracy Improvement and Comparison with Other Methods , 2008 .

[11]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Daniel Quang,et al.  DanQ: a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences , 2015 .

[13]  Beilun Wang,et al.  Deep Motif Dashboard: Visualizing and Understanding Genomic Sequences Using Deep Neural Networks , 2016, PSB.

[14]  Yanjun Qi,et al.  DeepChrome: deep-learning for predicting gene expression from histone modifications , 2016, Bioinform..

[15]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[16]  O. Stegle,et al.  DeepCpG: accurate prediction of single-cell DNA methylation states using deep learning , 2016, Genome Biology.

[17]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[18]  G. Crooks,et al.  WebLogo: a sequence logo generator. , 2004, Genome research.

[19]  Tu Bao Ho,et al.  Prediction of Histone Modifications in DNA sequences , 2007, 2007 IEEE 7th International Symposium on BioInformatics and BioEngineering.

[20]  O. Troyanskaya,et al.  Predicting effects of noncoding variants with deep learning–based sequence model , 2015, Nature Methods.

[21]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[23]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.