DeepGOA: Predicting Gene Ontology Annotations of Proteins via Graph Convolutional Network

Gene Ontology (GO) uses a series of standardized and controlled GO terms to describe the molecular functions, biological process roles and cellular locations of gene products (i.e., proteins and RNAs), it structurally organizes GO terms in a direct acyclic graph (DAG). GO has more than 40000 terms and each protein is only annotated with several or dozens of these terms. It is a difficult challenge to accurately annotate relevant GO terms to a protein from such a large number of candidate GO terms. Some deep learning models have been proposed to utilize the GO hierarchy for protein function prediction, but they inadequately utilize GO hierarchy. To use the knowledge encoded in the GO hierarchy, we propose a deep Graph Convolutional Network (GCN) based model (DeepGOA) to predict GO annotations of proteins. DeepGOA firstly utilizes GO annotations and hierarchy to measure the correlations between GO terms and to accordingly update the edge weights of the DAG, and then applies GCN on the updated DAG to learn the semantic representation and latent inter-relations of GO terms. At the same time, it uses Convolutional Neural Network (CNN) to learn the feature representation of amino acids sequences with respect to the semantic representations. After that, DeepGOA computes the dot product of two representations, which enables training the whole network end-to-end in a coherent fashion. Experiments on two model species (Human and Corn) show that DeepGOA outperforms the state-of-the-art deep learning based methods. The ablation study proves that GCN can employ the knowledge of GO and boost the performance.

[1]  Guoxian Yu,et al.  Gene function prediction based on Gene Ontology Hierarchy Preserving Hashing. , 2019, Genomics.

[2]  Judith A. Blake,et al.  On the Use of Gene Ontology Annotations to Assess Functional Similarity among Orthologs and Paralogs: A Short Report , 2012, PLoS Comput. Biol..

[3]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[4]  Chang Lu,et al.  HashGO: hashing gene ontology for protein function prediction , 2017, Comput. Biol. Chem..

[5]  Giorgio Valentini,et al.  True Path Rule Hierarchical Ensembles for Genome-Wide Gene Function Prediction , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[6]  Tony Sawford,et al.  Understanding how and why the Gene Ontology and its annotations evolve: the GO within UniProt , 2014, GigaScience.

[7]  Amarda Shehu,et al.  A Survey of Computational Methods for Protein Function Prediction , 2016 .

[8]  Dao-Qing Dai,et al.  A Framework for Incorporating Functional Interrelationships into Protein Function Prediction Algorithms , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[9]  Zhiwen Yu,et al.  Protein Function Prediction Using Multilabel Ensemble Classification , 2013, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[10]  Changhu Wang,et al.  Learning to reduce the semantic gap in web image retrieval and annotation , 2008, SIGIR '08.

[11]  Rodrigo C. Barros,et al.  Hierarchical multi-label classification with chained neural networks , 2017, SAC.

[12]  Maxat Kulmanov,et al.  DeepGOPlus: Improved protein function prediction from sequence , 2019 .

[13]  Carol Friedman,et al.  Information theory applied to the sparse gene ontology annotation network to predict novel gene function , 2007, ISMB/ECCB.

[14]  Christophe Dessimoz,et al.  The Gene Ontology Handbook , 2017, Methods in Molecular Biology.

[15]  Patricia C. Babbitt,et al.  Biases in the Experimental Annotations of Protein Function and Their Effect on Our Understanding of Protein Function Space , 2013, PLoS Comput. Biol..

[16]  Rengul Cetin-Atalay,et al.  Multi-task Deep Neural Networks in Automated Protein Function Prediction , 2017, 1705.04802.

[17]  Tapio Salakoski,et al.  An expanded evaluation of protein function prediction methods shows an improvement in accuracy , 2016, Genome Biology.

[18]  Li Ni,et al.  The Gene Ontology's Reference Genome Project: A Unified Framework for Functional Annotation across Species , 2009, PLoS Comput. Biol..

[19]  Giorgio Valle,et al.  The Gene Ontology in 2010: extensions and refinements , 2009, Nucleic Acids Res..

[20]  Maxat Kulmanov,et al.  DeepGO: predicting protein functions from sequence and interactions using a deep ontology-aware classifier , 2017, Bioinform..

[21]  Xiaoyan Liu,et al.  Measuring gene functional similarity based on group-wise comparison of GO terms , 2013, Bioinform..

[22]  Guangyuan Fu,et al.  NewGOA: Predicting New GO Annotations of Proteins by Bi-Random Walks on a Hybrid Graph , 2018, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[23]  Hailong Zhu,et al.  Predicting protein functions using incomplete hierarchical labels , 2015, BMC Bioinformatics.

[24]  Dong Yu,et al.  Deep Learning: Methods and Applications , 2014, Found. Trends Signal Process..

[25]  Zhiwen Yu,et al.  Protein Function Prediction with Incomplete Annotations , 2014, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[26]  Bo Yang,et al.  NegGOA: negative GO annotations selection using ontology structure , 2016, Bioinform..

[27]  Vipin Kumar,et al.  Incorporating functional inter-relationships into protein function prediction algorithms , 2009, BMC Bioinformatics.

[28]  Predrag Radivojac,et al.  Information-theoretic evaluation of predicted ontological annotations , 2013, Bioinform..