Interspecies gene function prediction using semantic similarity

BackgroundGene Ontology (GO) is a collaborative project that maintains and develops controlled vocabulary (or terms) to describe the molecular function, biological roles and cellular location of gene products in a hierarchical ontology. GO also provides GO annotations that associate genes with GO terms. GO consortium independently and collaboratively annotate terms to gene products, mainly from model organisms (or species) they are interested in. Due to experiment ethics, research interests of biologists and resources limitations, homologous genes from different species currently are annotated with different terms. These differences can be more attributed to incomplete annotations of genes than to functional difference between them.ResultsSemantic similarity between genes is derived from GO hierarchy and annotations of genes. It is positively correlated with the similarity derived from various types of biological data and has been applied to predict gene function. In this paper, we investigate whether it is possible to replenish annotations of incompletely annotated genes by using semantic similarity between genes from two species with homology. For this investigation, we utilize three representative semantic similarity metrics to compute similarity between genes from two species. Next, we determine the k nearest neighborhood genes from the two species based on the chosen metric and then use terms annotated to k neighbors of a gene to replenish annotations of that gene. We perform experiments on archived (from Jan-2014 to Jan-2016) GO annotations of four species (Human, Mouse, Danio rerio and Arabidopsis thaliana) to assess the contribution of semantic similarity between genes from different species. The experimental results demonstrate that: (1) semantic similarity between genes from homologous species contributes much more on the improved accuracy (by 53.22%) than genes from single species alone, and genes from two species with low homology; (2) GO annotations of genes from homologous species are complementary to each other.ConclusionsOur study shows that semantic similarity based interspecies gene function annotation from homologous species is more prominent than traditional intraspecies approaches. This work can promote more research on semantic similarity based function prediction across species.

[1]  Q. Zou,et al.  Approaches for Recognizing Disease Genes Based on Network , 2014, BioMed research international.

[2]  Nicolò Cesa-Bianchi,et al.  Synergy of multi-label hierarchical ensembles, data fusion, and cost-sensitive methods for gene functional inference , 2012, Machine Learning.

[3]  Xiangxiang Zeng,et al.  Integrative approaches for predicting microRNA function and prioritizing disease-related microRNA using biological interaction networks , 2016, Briefings Bioinform..

[4]  Vladimir Pavlovic,et al.  Prediction of Protein Functions with Gene Ontology and Interspecies Protein Homology Data , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[5]  Lenore Cowen,et al.  New directions for diffusion-based network prediction of protein function: incorporating pathways with confidence , 2014, Bioinform..

[6]  Daniel W. A. Buchan,et al.  A large-scale evaluation of computational protein function prediction , 2013, Nature Methods.

[7]  Karin M. Verspoor,et al.  Biomedical Text Mining: State-of-the-Art, Open Problems and Future Challenges , 2014, Interactive Knowledge Discovery and Data Mining in Biomedical Informatics.

[8]  Steffen Staab,et al.  Taxonomy Learning - Factoring the Structure of a Taxonomy into a Semantic Classification Decision , 2002, COLING.

[9]  Thomas Lengauer,et al.  A new measure for functional similarity of gene products based on Gene Ontology , 2006, BMC Bioinformatics.

[10]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[11]  Cathy H. Wu,et al.  The Human Proteome Project: Current State and Future Direction , 2011, Molecular & Cellular Proteomics.

[12]  Jiming Liu,et al.  Predicting protein function via downward random walks on a gene ontology , 2015, BMC Bioinformatics.

[13]  Hailong Zhu,et al.  Predicting protein functions using incomplete hierarchical labels , 2015, BMC Bioinformatics.

[14]  David A. Lee,et al.  Predicting protein function from sequence and structure , 2007, Nature Reviews Molecular Cell Biology.

[15]  Dongmei Li,et al.  An evaluation of statistical methods for DNA methylation microarray data analysis , 2015, BMC Bioinformatics.

[16]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[17]  Judith A. Blake,et al.  On the Use of Gene Ontology Annotations to Assess Functional Similarity among Orthologs and Paralogs: A Short Report , 2012, PLoS Comput. Biol..

[18]  Alfredo Benso,et al.  A combined approach for genome wide protein function annotation/prediction , 2013, Proteome Science.

[19]  Xiaoyan Liu,et al.  Measuring gene functional similarity based on group-wise comparison of GO terms , 2013, Bioinform..

[20]  Philip S. Yu,et al.  A new method to measure the semantic similarity of GO terms , 2007, Bioinform..

[21]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[22]  Martha Palmer,et al.  Verb Semantics and Lexical Selection , 1994, ACL.

[23]  Haixuan Yang,et al.  Improving GO semantic similarity measures by exploring the ontology beneath the terms and modelling uncertainty , 2012, Bioinform..

[24]  Igor Jurisica,et al.  Knowledge Discovery and Data Mining in Biomedical Informatics: State-of-the-Art and Future Challenges , 2014 .

[25]  Angel Rubio,et al.  Correlation between gene expression and GO semantic similarity , 2005, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[26]  Phillip W. Lord,et al.  Semantic Similarity in Biomedical Ontologies , 2009, PLoS Comput. Biol..

[27]  Carol Friedman,et al.  Information theory applied to the sparse gene ontology annotation network to predict novel gene function , 2007, ISMB/ECCB.

[28]  Catia Pesquita,et al.  Metrics for GO based protein semantic similarity: a systematic evaluation , 2008, BMC Bioinformatics.

[29]  R. Sharan,et al.  Network-based prediction of protein function , 2007, Molecular systems biology.

[30]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[31]  C. Orengo,et al.  Protein function annotation by homology-based inference , 2009, Genome Biology.

[32]  Min-Ling Zhang,et al.  A Review on Multi-Label Learning Algorithms , 2014, IEEE Transactions on Knowledge and Data Engineering.

[33]  Giorgio Valentini,et al.  True Path Rule Hierarchical Ensembles for Genome-Wide Gene Function Prediction , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[34]  S. Dwight,et al.  Predicting gene function from patterns of annotation. , 2003, Genome research.

[35]  Matthew A. Hibbs,et al.  Finding function: evaluation methods for functional genomic data , 2006, BMC Genomics.

[36]  Roy Rada,et al.  Development and application of a metric on semantic nets , 1989, IEEE Trans. Syst. Man Cybern..

[37]  Rolf Apweiler,et al.  GOAnnotator: linking protein GO annotations to evidence text , 2006, Journal of biomedical discovery and collaboration.

[38]  Richard J Roberts,et al.  Identifying Protein Function—A Call for Community Action , 2004, PLoS biology.

[39]  Zili Zhang,et al.  Predicting Protein Function Using Multiple Kernels , 2015, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[40]  Q. Zou,et al.  Prediction of MicroRNA-Disease Associations Based on Social Network Analysis Methods , 2015, BioMed research international.

[41]  Mario Cannataro,et al.  Semantic similarity analysis of protein data: assessment with biological features and issues , 2012, Briefings Bioinform..

[42]  Huzefa Rangwala,et al.  Protein Function Prediction Using Dependence Maximization , 2013, ECML/PKDD.

[43]  Carole A. Goble,et al.  Investigating Semantic Similarity Measures Across the Gene Ontology: The Relationship Between Sequence and Annotation , 2003, Bioinform..

[44]  Tianwei Yu,et al.  K-Profiles: A Nonlinear Clustering Method for Pattern Detection in High Dimensional Data , 2015, BioMed research international.

[45]  Quaid Morris,et al.  Fast integration of heterogeneous data sources for predicting gene function with limited annotation , 2010, Bioinform..

[46]  Mayumi Okamoto,et al.  Positive regulation of Rho GTPase activity by RhoGDIs as a result of their direct interaction with GAPs , 2015, BMC Systems Biology.

[47]  Zhiwen Yu,et al.  Protein Function Prediction Using Multilabel Ensemble Classification , 2013, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[48]  Zhi-Hua Zhou,et al.  Genome-Wide Protein Function Prediction through Multi-Instance Multi-Label Learning , 2014, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[49]  Li Ni,et al.  The Gene Ontology's Reference Genome Project: A Unified Framework for Functional Annotation across Species , 2009, PLoS Comput. Biol..

[50]  Neil R Smalheiser,et al.  Launching the "Journal of Biomedical Discovery and Collaboration" , 2006, Journal of Biomedical Discovery and Collaboration.

[51]  Paul Pavlidis,et al.  Gene Ontology term overlap as a measure of gene functional similarity , 2008, BMC Bioinformatics.

[52]  Robert E. Schapire,et al.  Hierarchical multi-label prediction of gene function , 2006, Bioinform..

[53]  Luca de Alfaro,et al.  The Gene Wiki in 2011: community intelligence applied to human gene annotation , 2011, Nucleic Acids Res..

[54]  Hailong Zhu,et al.  Integrating multiple networks for protein function prediction , 2015, BMC Systems Biology.

[55]  K. Dolinski,et al.  Use and misuse of the gene ontology annotations , 2008, Nature Reviews Genetics.

[56]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[57]  Patricia C. Babbitt,et al.  Biases in the Experimental Annotations of Protein Function and Their Effect on Our Understanding of Protein Function Space , 2013, PLoS Comput. Biol..

[58]  Matthew W. Hahn,et al.  Distinguishing among evolutionary models for the maintenance of gene duplicates. , 2009, The Journal of heredity.