exp2GO: Improving Prediction of Functions in the Gene Ontology With Expression Data

The computational methods for the prediction of gene function annotations aim to automatically find associations between a gene and a set of Gene Ontology (GO) terms describing its functions. Since the hand-made curation process of novel annotations and the corresponding wet experiments validations are very time-consuming and costly procedures, there is a need for computational tools that can reliably predict likely annotations and boost the discovery of new gene functions. This work proposes a novel method for predicting annotations based on the inference of GO similarities from expression similarities. The novel method was benchmarked against other methods on several public biological datasets, obtaining the best comparative results. exp2GO effectively improved the prediction of GO annotations in comparison to state-of-the-art methods. Furthermore, the proposal was validated with a full genome case where it was capable of predicting relevant and accurate biological functions. The repository of this project withh full data and code is available at https://github.com/sinc-lab/exp2GO.

[1]  Jun Wang,et al.  NMFGO: Gene Function Prediction via Nonnegative Matrix Factorization with Gene Ontology , 2020, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[2]  Tapio Salakoski,et al.  The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens , 2019, Genome Biology.

[3]  Maxat Kulmanov,et al.  DeepGOPlus: improved protein function prediction from sequence , 2019, bioRxiv.

[4]  Kengo Kinoshita,et al.  ATTED-II in 2018: A Plant Coexpression Database Based on Investigation of the Statistical Property of the Mutual Rank Index , 2018, Plant & cell physiology.

[5]  S. Hido,et al.  CuPy : A NumPy-Compatible Library for NVIDIA GPU Calculations , 2017 .

[6]  Guangyuan Fu,et al.  Predicting Protein Function via Semantic Integration of Multiple Networks , 2016, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[7]  C. Domeniconi,et al.  Predicting protein function via downward random walks on a gene ontology , 2015, BMC Bioinformatics.

[8]  Marco Masseroli,et al.  Computational algorithms to predict Gene Ontology annotations , 2015, BMC Bioinformatics.

[9]  Daniel W. A. Buchan,et al.  A large-scale evaluation of computational protein function prediction , 2013, Nature Methods.

[10]  Ni Li,et al.  Gene Ontology Annotations and Resources , 2012, Nucleic Acids Res..

[11]  Chris H. Q. Ding,et al.  Symmetric Nonnegative Matrix Factorization for Graph Clustering , 2012, SDM.

[12]  Lothar Willmitzer,et al.  Interaction with Diurnal and Circadian Regulation Results in Dynamic Metabolic and Transcriptional Changes during Cold Acclimation in Arabidopsis , 2010, PloS one.

[13]  P. Quail,et al.  Phytochrome functions in Arabidopsis development. , 2010, Journal of experimental botany.

[14]  Seungjin Choi,et al.  Semi-Supervised Nonnegative Matrix Factorization , 2010, IEEE Signal Processing Letters.

[15]  Andrzej Cichocki,et al.  Nonnegative Matrix and Tensor Factorization T , 2007 .

[16]  Frédéric Pontvianne,et al.  PHYTOCHROME B and HISTONE DEACETYLASE 6 Control Light-Induced Chromatin Compaction in Arabidopsis thaliana , 2009, PLoS genetics.

[17]  Phillip W. Lord,et al.  Semantic Similarity in Biomedical Ontologies , 2009, PLoS Comput. Biol..

[18]  Seungjin Choi,et al.  Weighted nonnegative matrix factorization , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[19]  Fei Wang,et al.  Semi-Supervised Clustering via Matrix Factorization , 2008, SDM.

[20]  Carol Friedman,et al.  Information theory applied to the sparse gene ontology annotation network to predict novel gene function , 2007, ISMB/ECCB.

[21]  Cheng Li,et al.  Adjusting batch effects in microarray expression data using empirical Bayes methods. , 2007, Biostatistics.

[22]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[23]  Chris H. Q. Ding,et al.  Orthogonal nonnegative matrix t-factorizations for clustering , 2006, KDD '06.

[24]  Robert E. Schapire,et al.  Hierarchical multi-label prediction of gene function , 2006, Bioinform..

[25]  Miguel A. Andrade-Navarro,et al.  Gene annotation from scientific literature using mappings between keyword systems , 2004, Bioinform..

[26]  Eric M. Just,et al.  dictyBase: a new Dictyostelium discoideum genome database , 2004, Nucleic Acids Res..

[27]  S. Dwight,et al.  Predicting gene function from patterns of annotation. , 2003, Genome research.

[28]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[29]  Philip Resnik,et al.  Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language , 1999, J. Artif. Intell. Res..

[30]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[31]  Elliot M. Meyerowitz,et al.  The ABCs of floral homeotic genes , 1994, Cell.

[32]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.