GONET: A Deep Network to Annotate Proteins via Recurrent Convolution Networks

Finding out the functions of protein in life activities precisely is nontrivial, which is the core of current proteomics research. Gene Ontology standardizes the function of protein into a series of GO terms, each of which belongs to exactly one of the three subontologies: Biological Process (BP), Cellular Component (CC), and Molecular Function (MF). The prediction of protein function can be considered as a multi-label classification problem. Traditional methods often spend a lot of costs to extract handcrafted features and plenty of domain knowledge is needed when solving these tasks, while using deep learning technology can overcome these shortcomings. Here, we propose a deep model GONET based on recurrent convolutional neural networks, which annotates protein in an end-to-end manner. Our model combines protein sequences and protein-protein interaction (PPI) network data, and utilizes representation learning to learn distributed representation of proteins to overcome the sparse nature and semantic independence problem. Moreover, we adopt a quite deep CNNRNN-Attention model, which is able to effectively extract high-order features of protein sequences. We have carried out experiments on several datasets, which achieve the state-of-the-art in some metrics compared with the existing competitive methods.

[1]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[2]  Yiming Yang,et al.  XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[3]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[4]  Christian von Mering,et al.  STRING: a database of predicted functional associations between proteins , 2003, Nucleic Acids Res..

[5]  Lei Deng,et al.  SDN2GO: An Integrated Deep Learning Model for Protein Function Prediction , 2020, Frontiers in Bioengineering and Biotechnology.

[6]  Predrag Radivojac,et al.  Information-theoretic evaluation of predicted ontological annotations , 2013, Bioinform..

[7]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000 , 2000, Nucleic Acids Res..

[8]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[9]  C. Anfinsen Principles that govern the folding of protein chains. , 1973, Science.

[10]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[11]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[12]  Shanfeng Zhu,et al.  DeepText2Go: Improving large-scale protein function prediction with deep semantic text representation , 2017, 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[13]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[14]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[15]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[16]  Maxat Kulmanov,et al.  DeepGO: predicting protein functions from sequence and interactions using a deep ontology-aware classifier , 2017, Bioinform..

[17]  Jian Tang,et al.  GraphVite: A High-Performance CPU-GPU Hybrid System for Node Embedding , 2019, WWW.

[18]  Zili Zhang,et al.  Predicting Protein Function Using Multiple Kernels , 2015, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[19]  Günther Zehetner,et al.  OntoBlast function: from sequence similarities directly to potential functional annotations by ontology terms , 2003, Nucleic Acids Res..

[20]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[21]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[22]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[23]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[24]  Xiangliang Zhang,et al.  DeepGOA: Predicting Gene Ontology Annotations of Proteins via Graph Convolutional Network , 2019, 2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[25]  Geoffrey J. Barton,et al.  GOtcha: a new method for prediction of protein function assessed by the annotation of seven genomes , 2004, BMC Bioinformatics.

[26]  Maxat Kulmanov,et al.  DeepGOPlus: Improved protein function prediction from sequence , 2019 .

[27]  Ehsaneddin Asgari,et al.  Continuous Distributed Representation of Biological Sequences for Deep Proteomics and Genomics , 2015, PloS one.