Convolutional Neural Networks Grouped by Transcription Factors for Predicting Protein-DNA Binding Site

Understanding the specific interactions of transcription factors (TFs) and DNA is essential for comprehending regulatory processes in biological systems. Recently deep learning algorithms have outperformed conventional time-consuming and expensive methods such as ChIP-seq in predicting the sequence specificities of DNA-protein binding. However, because TF binding is a cell-specific behavior, most current deep learning methods build one model for each TF-cell line combination, which leads to problems such as the complexity of maintaining numerous models and the poor prediction performance of some models for cell lines without enough ChIP-seq data. Thus, it is useful to build models with both higher accuracy and wider range of application. We propose a method to build a series of Convolutional Neural Network (CNN) based models grouped by TFs, which are named TF models. Trained with the same database of 554 ChIP-seq data, the proposed TF models outperform DeepBind in the motif discovery task. On one hand, the amount of models has been reduced from 554 to 72, which extend the application scope of each model. On the other hand, TF models achieve higher AUC than Deepbind on 94.2% TF-cell line combinations. Moreover, we demonstrated that TF model achieve an average AUC 0.909 when predict the binding of TFs in cell lines that lack ChIP-seq data.

[1]  Dan Xie,et al.  Dynamic trans-Acting Factor Colocalization in Human Cells , 2013, Cell.

[2]  ENCODEConsortium,et al.  An Integrated Encyclopedia of DNA Elements in the Human Genome , 2012, Nature.

[3]  Philip Machanick,et al.  MEME-ChIP: motif analysis of large DNA datasets , 2011, Bioinform..

[4]  Gary D. Stormo,et al.  Modeling the specificity of protein-DNA interactions , 2013, Quantitative Biology.

[5]  David K. Gifford,et al.  Convolutional neural network architectures for predicting DNA–protein binding , 2016, Bioinform..

[6]  B. Frey,et al.  Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning , 2015, Nature Biotechnology.

[7]  R. Rohs,et al.  How motif environment influences transcription factor search dynamics: Finding a needle in a haystack , 2016, BioEssays : news and reviews in molecular, cellular and developmental biology.

[8]  Beilun Wang,et al.  Deep Motif Dashboard: Visualizing and Understanding Genomic Sequences Using Deep Neural Networks , 2016, PSB.

[9]  Ole Winther,et al.  JASPAR, the open access database of transcription factor-binding profiles: new content and tools in the 2008 update , 2007, Nucleic Acids Res..

[10]  Data production leads,et al.  An integrated encyclopedia of DNA elements in the human genome , 2012 .

[11]  Byunghan Lee,et al.  Deep learning in bioinformatics , 2016, Briefings Bioinform..

[12]  Morteza Mohammad Noori,et al.  Enhanced Regulatory Sequence Prediction Using Gapped k-mer Features , 2014, PLoS Comput. Biol..

[13]  Xiaohui S. Xie,et al.  DanQ: a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences , 2015, bioRxiv.

[14]  May D. Wang,et al.  DeeperBind: Enhancing Prediction of Sequence Specificities of DNA Binding Proteins , 2016, bioRxiv.

[15]  R. Rohs,et al.  A widespread role of the motif environment in transcription factor binding across diverse protein families , 2015, Genome research.

[16]  Anne E Carpenter,et al.  Opportunities and obstacles for deep learning in biology and medicine , 2017, bioRxiv.

[17]  Gary D. Stormo,et al.  DNA binding sites: representation and discovery , 2000, Bioinform..

[18]  P. Park ChIP–seq: advantages and challenges of a maturing technology , 2009, Nature Reviews Genetics.

[19]  Jacob F. Degner,et al.  Sequence and Chromatin Accessibility Data Accurate Inference of Transcription Factor Binding from Dna Material Supplemental Open Access , 2022 .

[20]  Jianxing Feng,et al.  Imputation for transcription factor binding predictions based on deep learning , 2017, PLoS Comput. Biol..

[21]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.