Utilizing shared interacting domain patterns and Gene Ontology information to improve protein-protein interaction prediction

Protein-protein interactions (PPIs) play a significant role in many crucial cellular operations such as metabolism, signaling and regulations. The computational methods for predicting PPIs have shown tremendous growth in recent years, but problem such as huge false positive rates has contributed to the lack of solid PPI information. We aimed at enhancing the overlap between computational predictions and experimental results in an effort to partially remove PPIs falsely predicted. The use of protein function predictor named PFP() that are based on shared interacting domain patterns is introduced in this study with the purpose of aiding the Gene Ontology Annotations (GOA). We used GOA and PFP() as agents in a filtering process to reduce false positive pairs in the computationally predicted PPI datasets. The functions predicted by PFP() were extracted from cross-species PPI data in order to assign novel functional annotations for the uncharacterized proteins and also as additional functions for those that are already characterized by the GO (Gene Ontology). The implementation of PFP() managed to increase the chances of finding matching function annotation for the first rule in the filtration process as much as 20%. To assess the capability of the proposed framework in filtering false PPIs, we applied it on the available S. cerevisiae PPIs and measured the performance in two aspects, the improvement made indicated as Signal-to-Noise Ratio (SNR) and the strength of improvement, respectively. The proposed filtering framework significantly achieved better performance than without it in both metrics.

[1]  Chuan Wang,et al.  InPrePPI: an integrated evaluation method based on genomic context for predicting protein-protein interactions in prokaryotic genomes , 2007, BMC Bioinformatics.

[2]  K. Chou,et al.  Recent progress in protein subcellular location prediction. , 2007, Analytical biochemistry.

[3]  Cathy H. Wu,et al.  UniProt: the Universal Protein knowledgebase , 2004, Nucleic Acids Res..

[4]  Ziv Bar-Joseph,et al.  Evaluation of different biological data and computational classification methods for use in protein interaction prediction , 2006, Proteins.

[5]  K. Chou,et al.  Predicting the quaternary structure attribute of a protein by hybridizing functional domain composition and pseudo amino acid composition , 2009 .

[6]  J. Thornton,et al.  Structural characterisation and functional significance of transient protein-protein interactions. , 2003, Journal of molecular biology.

[7]  Kuo-Chen Chou,et al.  Predicting enzyme family classes by hybridizing gene product composition and pseudo-amino acid composition. , 2005, Journal of theoretical biology.

[8]  L. Holm,et al.  The Pfam protein families database , 2005, Nucleic Acids Res..

[9]  D. Eisenberg,et al.  Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Jesús A. Izaguirre,et al.  Predicting Protein-Protein Interactions from Protein Domains Using a Set Cover Approach , 2007, IEEE ACM Trans. Comput. Biol. Bioinform..

[11]  K. Chou,et al.  Using Functional Domain Composition and Support Vector Machines for Prediction of Protein Subcellular Location* , 2002, The Journal of Biological Chemistry.

[12]  Christian von Mering,et al.  STRING: a database of predicted functional associations between proteins , 2003, Nucleic Acids Res..

[13]  Kuo-Chen Chou,et al.  Predicting eukaryotic protein subcellular location by fusing optimized evidence-theoretic K-Nearest Neighbor classifiers. , 2006, Journal of proteome research.

[14]  K. Chou,et al.  Hum-PLoc: a novel ensemble classifier for predicting human protein subcellular localization. , 2006, Biochemical and biophysical research communications.

[15]  T. Miyazu,et al.  Evaluation of analytical methods using signal-noise ratio as a statistical criterion , 1974 .

[16]  Dong Xu,et al.  Computational analyses of high-throughput protein-protein interaction data. , 2003, Current protein & peptide science.

[17]  Artem Cherkasov,et al.  The use of Gene Ontology terms for predicting highly-connected 'hub' nodes in protein-protein interaction networks , 2008, BMC Systems Biology.

[18]  D. Eisenberg,et al.  A combined algorithm for genome-wide prediction of protein function , 1999, Nature.

[19]  Emily Dimmer,et al.  The Gene Ontology Annotation (GOA) Database: sharing knowledge in Uniprot with Gene Ontology , 2004, Nucleic Acids Res..

[20]  Xiaomei Wu,et al.  Prediction of yeast protein–protein interaction network: insights from the Gene Ontology and annotations , 2006, Nucleic acids research.

[21]  Kuo-Chen Chou,et al.  Predicting enzyme subclass by functional domain composition and pseudo amino acid composition. , 2005, Journal of proteome research.

[22]  Edward M Marcotte,et al.  Discovery of uncharacterized cellular systems by genome-wide analysis of functional linkages , 2003, Nature Biotechnology.

[23]  Alex Bateman,et al.  InterPro: An Integrated Documentation Resource for Protein Families, Domains and Functional Sites , 2002, Briefings Bioinform..

[24]  K. Chou,et al.  ProtIdent: a web server for identifying proteases and their types by fusing functional domain and sequential evolution information. , 2008, Biochemical and biophysical research communications.

[25]  Kuo-Chen Chou,et al.  Using GO-PseAA predictor to predict enzyme sub-class. , 2004, Biochemical and biophysical research communications.

[26]  K. Chou,et al.  Euk-mPLoc: a fusion classifier for large-scale eukaryotic protein subcellular location prediction by incorporating multiple sites. , 2007, Journal of proteome research.

[27]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[28]  Hanno Steen,et al.  Development of human protein reference database as an initial platform for approaching systems biology in humans. , 2003, Genome research.

[29]  Matthew D. Dyer,et al.  The Landscape of Human Proteins Interacting with Viruses and Other Pathogens , 2008, PLoS pathogens.

[30]  Maria Victoria Schneider,et al.  MINT: a Molecular INTeraction database. , 2002, FEBS letters.

[31]  P. Uetz,et al.  Towards an understanding of complex protein networks. , 2001, Trends in cell biology.

[32]  Mei Liu,et al.  Protein Function Assignment through Mining Cross-Species Protein-Protein Interactions , 2008, PloS one.

[33]  K. Chou,et al.  Predicting protein-protein interactions from sequences in a hybridization space. , 2006, Journal of proteome research.

[34]  Mark Gerstein,et al.  Bridging structural biology and genomics: assessing protein interaction data with known complexes. , 2002, Trends in genetics : TIG.

[35]  Haruki Nakamura,et al.  Filtering high-throughput protein-protein interaction data using a combination of genomic features , 2005, BMC Bioinformatics.

[36]  K. Chou,et al.  Prediction of protein subcellular locations by GO-FunD-PseAA predictor. , 2004, Biochemical and biophysical research communications.

[37]  Simon Kasif,et al.  Identification of functional links between genes using phylogenetic profiles , 2003, Bioinform..

[38]  Mike Tyers,et al.  BioGRID: a general repository for interaction datasets , 2005, Nucleic Acids Res..

[39]  Yoshihide Hayashizaki,et al.  Construction of reliable protein-protein interaction networks with a new interaction generality measure , 2003, Bioinform..

[40]  C. Deane,et al.  Protein Interactions , 2002, Molecular & Cellular Proteomics.

[41]  Yen-Han Lin,et al.  False positive reduction in protein-protein interaction predictions using gene ontology annotations , 2007, BMC Bioinformatics.

[42]  B. Snel,et al.  Comparative assessment of large-scale data sets of protein–protein interactions , 2002, Nature.

[43]  T. Barrette,et al.  Probabilistic model of the human protein-protein interaction network , 2005, Nature Biotechnology.

[44]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[45]  K. Chou Progress in protein structural class prediction and its impact to bioinformatics and proteomics. , 2005, Current protein & peptide science.

[46]  Adam J. Smith,et al.  The Database of Interacting Proteins: 2004 update , 2004, Nucleic Acids Res..

[47]  Kuo-Chen Chou,et al.  Predicting membrane protein type by functional domain composition and pseudo-amino acid composition. , 2006, Journal of theoretical biology.

[48]  K. Chou,et al.  REVIEW : Recent advances in developing web-servers for predicting protein attributes , 2009 .

[49]  Ioannis Xenarios,et al.  DIP: The Database of Interacting Proteins: 2001 update , 2001, Nucleic Acids Res..

[50]  Jia huai Wang,et al.  Protein recognition by cell surface receptors: physiological receptors versus virus interactions. , 2002, Trends in biochemical sciences.

[51]  Kuo-Chen Chou,et al.  Large‐scale plant protein subcellular location prediction , 2007, Journal of cellular biochemistry.

[52]  Zhen Liu,et al.  Refined phylogenetic profiles method for predicting protein-protein interactions , 2005, Bioinform..

[53]  Shoshana J. Wodak,et al.  CYGD: the Comprehensive Yeast Genome Database , 2004, Nucleic Acids Res..

[54]  Kuo-Chen Chou,et al.  Predicting subcellular localization of proteins in a hybridization space , 2004, Bioinform..

[55]  David Botstein,et al.  The Stanford Microarray Database , 2001, Nucleic Acids Res..

[56]  Robert D. Finn,et al.  The Pfam protein families database , 2004, Nucleic Acids Res..

[57]  Kuo-Chen Chou,et al.  Using GO-PseAA predictor to identify membrane proteins and their types. , 2005, Biochemical and biophysical research communications.

[58]  Shankar Subramaniam,et al.  Bioinformatics and cellular signaling. , 2004, Current opinion in biotechnology.

[59]  Minghua Deng,et al.  Inferring Domain–Domain Interactions From Protein–Protein Interactions , 2002 .

[60]  B. Snel,et al.  Predicting gene function by conserved co-expression. , 2003, Trends in genetics : TIG.

[61]  Olivier Lichtarge,et al.  BIOINFORMATICS ORIGINAL PAPER Systems biology , 2004 .

[62]  Ron Shamir,et al.  CLICK and EXPANDER: a system for clustering and visualizing gene expression data , 2003, Bioinform..

[63]  Madeline A. Crosby,et al.  FlyBase: genomes by the dozen , 2006, Nucleic Acids Res..

[64]  A. Valencia,et al.  Computational methods for the prediction of protein interactions. , 2002, Current opinion in structural biology.

[65]  Kuo-Chen Chou,et al.  Predicting subcellular localization of proteins by hybridizing functional domain composition and pseudo‐amino acid composition , 2004, Journal of cellular biochemistry.

[66]  Edward M. Marcotte,et al.  Protein function prediction using the Protein Link EXplorer (PLEX) , 2005, Bioinform..

[67]  Carole A. Goble,et al.  Investigating Semantic Similarity Measures Across the Gene Ontology: The Relationship Between Sequence and Annotation , 2003, Bioinform..

[68]  K. Chou,et al.  Cell-PLoc: a package of Web servers for predicting subcellular localization of proteins in various organisms , 2008, Nature Protocols.