Assessing reliability of protein-protein interactions by gene ontology integration

Recent advances in genome-wide identification of protein-protein interactions (PPIs) have produced an abundance of interaction data which give an insight into functional associations among proteins. However, it is known that the PPI datasets determined by high-throughput experiments or inferred by computational methods include an extremely large number of false positives. Using Gene Ontology (GO) and its annotations, we assess reliability of the PPIs by considering the semantic similarity of interacting proteins. Protein pairs with high semantic similarity are considered highly likely to share common functions, and therefore, are more likely to interact. We analyze the performance of existing semantic similarity measures in terms of functional consistency and propose a combined method that achieves improved performance over existing methods. The semantic similarity measures are applied to identify false positive PPIs. The classification results show that the combined hybrid method has higher accuracy than the other existing measures. Furthermore, the combined hybrid classifier predicts that 59.6% of the S. cerevisiae PPIs from the BioGRID database are false positives.

[1]  Shmuel Sattath,et al.  How reliable are experimental protein-protein interaction data? , 2003, Journal of molecular biology.

[2]  Sidahmed Benabderrahmane,et al.  IntelliGO: a new vector-based semantic similarity measure including annotation origin , 2010, BMC Bioinformatics.

[3]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[4]  Gary D. Bader,et al.  An improved method for scoring protein-protein interactions using semantic similarity within the gene ontology , 2010, BMC Bioinformatics.

[5]  Jing Zhu,et al.  Revealing and avoiding bias in semantic similarity scores for protein pairs , 2010, BMC Bioinformatics.

[6]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[7]  Martha Palmer,et al.  Verb Semantics and Lexical Selection , 1994, ACL.

[8]  Ted Pedersen,et al.  Measures of semantic similarity and relatedness in the biomedical domain , 2007, J. Biomed. Informatics.

[9]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[10]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[11]  H. Mewes,et al.  The FunCat, a functional annotation scheme for systematic classification of proteins from whole genomes. , 2004, Nucleic acids research.

[12]  Philip S. Yu,et al.  A new method to measure the semantic similarity of GO terms , 2007, Bioinform..

[13]  Hai Hu,et al.  Assessing semantic similarity measures for the characterization of human regulatory pathways , 2006, Bioinform..

[14]  Kara Dolinski,et al.  The BioGRID Interaction Database: 2011 update , 2010, Nucleic Acids Res..

[15]  Carole A. Goble,et al.  Investigating Semantic Similarity Measures Across the Gene Ontology: The Relationship Between Sequence and Annotation , 2003, Bioinform..

[16]  Benjamin A. Shoemaker,et al.  Deciphering Protein–Protein Interactions. Part II. Computational Methods to Predict Protein and Domain Interaction Partners , 2007, PLoS Comput. Biol..

[17]  Yoav Freund,et al.  A Short Introduction to Boosting , 1999 .

[18]  Giorgio Valle,et al.  The Gene Ontology in 2010: extensions and refinements , 2009, Nucleic Acids Res..

[19]  Paul Pavlidis,et al.  Gene Ontology term overlap as a measure of gene functional similarity , 2008, BMC Bioinformatics.

[20]  Carol Friedman,et al.  Information theory applied to the sparse gene ontology annotation network to predict novel gene function , 2007, ISMB/ECCB.

[21]  Phillip W. Lord,et al.  Semantic Similarity in Biomedical Ontologies , 2009, PLoS Comput. Biol..

[22]  Catia Pesquita,et al.  Metrics for GO based protein semantic similarity: a systematic evaluation , 2008, BMC Bioinformatics.

[23]  Arun K. Ramani,et al.  Protein interaction networks from yeast to human. , 2004, Current opinion in structural biology.

[24]  Riccardo Poli,et al.  Particle swarm optimization , 1995, Swarm Intelligence.

[25]  D. Eisenberg,et al.  Computational methods of analysis of protein-protein interactions. , 2003, Current opinion in structural biology.

[26]  B. Snel,et al.  Comparative assessment of large-scale data sets of protein–protein interactions , 2002, Nature.