Gene Ontology-driven inference of protein-protein interactions using inducers

MOTIVATION Protein-protein interactions (PPIs) are pivotal for many biological processes and similarity in Gene Ontology (GO) annotation has been found to be one of the strongest indicators for PPI. Most GO-driven algorithms for PPI inference combine machine learning and semantic similarity techniques. We introduce the concept of inducers as a method to integrate both approaches more effectively, leading to superior prediction accuracies. RESULTS An inducer (ULCA) in combination with a Random Forest classifier compares favorably to several sequence-based methods, semantic similarity measures and multi-kernel approaches. On a newly created set of high-quality interaction data, the proposed method achieves high cross-species prediction accuracies (Area under the ROC curve ≤ 0.88), rendering it a valuable companion to sequence-based methods. AVAILABILITY Software and datasets are available at http://bioinformatics.org.au/go2ppi/ CONTACT m.ragan@uq.edu.au.

[1]  Gary D. Bader,et al.  An improved method for scoring protein-protein interactions using semantic similarity within the gene ontology , 2010, BMC Bioinformatics.

[2]  Yungki Park,et al.  Critical assessment of sequence-based protein-protein interaction prediction methods that do not require homologous protein sequences , 2009, BMC Bioinformatics.

[3]  Dmitrij Frishman,et al.  The Negatome database: a reference set of non-interacting protein pairs , 2009, Nucleic Acids Res..

[4]  Phillip W. Lord,et al.  Semantic Similarity in Biomedical Ontologies , 2009, PLoS Comput. Biol..

[5]  P. Rouzé,et al.  Predicting protein-protein interactions in Arabidopsis thaliana through integration of orthology, gene ontology and co-expression , 2009, BMC Genomics.

[6]  Mark F. Rogers,et al.  The use of gene ontology evidence codes in preventing classifier assessment bias , 2009, Bioinform..

[7]  M. He,et al.  PPI Finder: A Mining Tool for Human Protein-Protein Interactions , 2009, PloS one.

[8]  Christian von Mering,et al.  STRING 8—a global view on proteins and their functional interactions in 630 organisms , 2008, Nucleic Acids Res..

[9]  Catia Pesquita,et al.  Metrics for GO based protein semantic similarity: a systematic evaluation , 2008, BMC Bioinformatics.

[10]  Yanzhi Guo,et al.  Using support vector machine combined with auto covariance to predict protein–protein interactions from protein sequences , 2008, Nucleic acids research.

[11]  William Stafford Noble,et al.  Predicting Co-Complexed Protein Pairs from Heterogeneous Data , 2008, PLoS Comput. Biol..

[12]  M. Ashburner,et al.  The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration , 2007, Nature Biotechnology.

[13]  Craig E. Jones,et al.  Estimating the annotation error rate of curated GO database sequence annotations , 2007, BMC Bioinformatics.

[14]  Juwen Shen,et al.  Predicting protein–protein interactions based only on sequences information , 2007, Proceedings of the National Academy of Sciences.

[15]  F. Dehne,et al.  PIPE: a protein-protein interaction prediction engine based on the re-occurring short polypeptide sequences between known interacting protein pairs , 2006, BMC Bioinformatics.

[16]  Ziv Bar-Joseph,et al.  Evaluation of different biological data and computational classification methods for use in protein interaction prediction , 2006, Proteins.

[17]  Xiaomei Wu,et al.  Prediction of yeast protein–protein interaction network: insights from the Gene Ontology and annotations , 2006, Nucleic acids research.

[18]  Hai Hu,et al.  Assessing semantic similarity measures for the characterization of human regulatory pathways , 2006, Bioinform..

[19]  William Stafford Noble,et al.  Choosing negative examples for the prediction of protein-protein interactions , 2006, BMC Bioinformatics.

[20]  William Stafford Noble,et al.  Large-scale identification of yeast integral membrane protein interactions. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[21]  T. Barrette,et al.  Probabilistic model of the human protein-protein interaction network , 2005, Nature Biotechnology.

[22]  M. Gerstein,et al.  Assessing the limits of genomic data integration for predicting protein networks. , 2005, Genome research.

[23]  Haruki Nakamura,et al.  Filtering high-throughput protein-protein interaction data using a combination of genomic features , 2005, BMC Bioinformatics.

[24]  Jean-Loup Faulon,et al.  Predicting protein-protein interactions using signature products , 2005, Bioinform..

[25]  Christian von Mering,et al.  STRING: known and predicted protein–protein associations, integrated and transferred across organisms , 2004, Nucleic Acids Res..

[26]  Cathy H. Wu,et al.  The Universal Protein Resource (UniProt) , 2004, Nucleic Acids Res..

[27]  Mark Gerstein,et al.  Information assessment on predicting protein-protein interactions , 2004, BMC Bioinformatics.

[28]  M. Gerstein,et al.  A Bayesian Networks Approach for Predicting Protein-Protein Interactions from Genomic Data , 2003, Science.

[29]  B. Snel,et al.  Comparative assessment of large-scale data sets of protein–protein interactions , 2002, Nature.

[30]  D. Eisenberg,et al.  Protein function in the post-genomic era , 2000, Nature.

[31]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[32]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[33]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[34]  Naveen Garg,et al.  A 3-approximation for the minimum tree spanning k vertices , 1996, Proceedings of 37th Conference on Foundations of Computer Science.

[35]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[36]  Mark A. Ragan,et al.  BMC Systems Biology BioMed Central Research article Protein-protein interaction as a predictor of subcellular location , 2008 .

[37]  BMC Bioinformatics BioMed Central Methodology article A new measure for functional similarity of gene products based on Gene Ontology , 2006 .

[38]  William Stafford Noble,et al.  Kernel methods for predicting protein-protein interactions , 2005, ISMB.

[39]  M. Gerstein,et al.  Assessing the Limits of Genomic Data Integration for Predicting Protein-Protein Interactions , 2005 .

[40]  L. Breiman Random Forests , 2001, Machine Learning.

[41]  D. Bowtell,et al.  Options available — from start to finish — for obtaining expression data by microarray , 1999, Nature Genetics.

[42]  Shivashankar H. Nagaraj,et al.  University of Birmingham High throughput functional annotation and data mining with the Blast2GO suite , 2022 .