DNA-GCN: Graph convolutional networks for predicting DNA-protein binding

Predicting DNA-protein binding is an important and classic problem in bioinformatics. Convolutional neural networks have outperformed conventional methods in modeling the sequence specificity of DNA-protein binding. However, none of the studies has utilized graph convolutional networks for motif inference. In this work, we propose to use graph convolutional networks for motif inference. We build a sequence k-mer graph for the whole dataset based on k-mer co-occurrence and k-mer sequence relationship and then learn DNA Graph Convolutional Network(DNA-GCN) for the whole dataset. Our DNA-GCN is initialized with a one-hot representation for all nodes, and it then jointly learns the embeddings for both k-mers and sequences, as supervised by the known labels of sequences. We evaluate our model on 50 datasets from ENCODE. DNA-GCN shows its competitive performance compared with the baseline model. Besides, we analyze our model and design several different architectures to help fit different datasets.

[1]  Yanfang Ye,et al.  Heterogeneous Graph Attention Network , 2019, WWW.

[2]  Markus Kollmann,et al.  Neural networks with circular filters enable data efficient inference of sequence motifs , 2019, Bioinform..

[3]  Cao Xiao,et al.  FastGCN: Fast Learning with Graph Convolutional Networks via Importance Sampling , 2018, ICLR.

[4]  Hong-Bin Shen,et al.  RNA-protein binding motifs mining with a new hybrid deep learning based cross-domain knowledge integration approach , 2016, BMC Bioinformatics.

[5]  David K. Gifford,et al.  Convolutional neural network architectures for predicting DNA–protein binding , 2016, Bioinform..

[6]  Tom Fawcett,et al.  ROC Graphs: Notes and Practical Considerations for Researchers , 2007 .

[7]  May D. Wang,et al.  DeeperBind: Enhancing Prediction of Sequence Specificities of DNA Binding Proteins , 2016, bioRxiv.

[8]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[9]  Dongwon Lee,et al.  LS-GKM: a new gkm-SVM for large-scale datasets , 2016, Bioinform..

[10]  Yuan Luo,et al.  MedGCN: Graph Convolutional Networks for Multiple Medical Tasks , 2019, ArXiv.

[11]  Clifford A. Meyer,et al.  Model-based Analysis of ChIP-Seq (MACS) , 2008, Genome Biology.

[12]  Junchi Yan,et al.  Prediction of RNA-protein sequence and structure binding preferences using deep convolutional and recurrent neural networks , 2017, BMC Genomics.

[13]  Weilai Chi,et al.  Deepprune: Learning Efficient and Interpretable Convolutional Networks Through Weight Pruning for Predicting DNA-Protein Binding , 2019, bioRxiv.

[14]  Xiaohui S. Xie,et al.  DanQ: a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences , 2015, bioRxiv.

[15]  Yuan Luo,et al.  Graph Convolutional Networks for Text Classification , 2018, AAAI.

[16]  Morteza Mohammad Noori,et al.  Enhanced Regulatory Sequence Prediction Using Gapped k-mer Features , 2014, PLoS Comput. Biol..

[17]  Jun Cheng,et al.  Modeling positional effects of regulatory sequences with spline transformations increases prediction accuracy of deep neural networks , 2017, bioRxiv.

[18]  Mark Goadrich,et al.  The relationship between Precision-Recall and ROC curves , 2006, ICML.

[19]  Junchi Yan,et al.  Attention based convolutional neural network for predicting RNA-protein binding sites , 2017, ArXiv.

[20]  Minghua Deng,et al.  Expectation pooling: an effective and interpretable pooling method for predicting DNA–protein binding , 2019, bioRxiv.

[21]  Qiang Ma,et al.  Dual Graph Convolutional Networks for Graph-Based Semi-Supervised Classification , 2018, WWW.

[22]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[23]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[24]  B. Frey,et al.  Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning , 2015, Nature Biotechnology.

[25]  Jure Leskovec,et al.  Inductive Representation Learning on Large Graphs , 2017, NIPS.

[26]  Pietro Liò,et al.  Graph Attention Networks , 2017, ICLR.

[27]  Xiao-Ming Wu,et al.  Deeper Insights into Graph Convolutional Networks for Semi-Supervised Learning , 2018, AAAI.

[28]  De-Shuang Huang,et al.  Recurrent Neural Network for Predicting Transcription Factor Binding Sites , 2018, Scientific Reports.

[29]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[30]  Paolo Frasconi,et al.  RNAcommender: genome-wide recommendation of RNA-protein interactions , 2016, Bioinform..

[31]  Zhen Cao,et al.  Simple tricks of convolutional neural network architectures improve DNA-protein binding prediction , 2018, Bioinform..