Assessing and predicting protein interactions using both local and global network topological metrics.

High-throughput protein interaction data, with ever-increasing volume, are becoming the foundation of many biological discoveries. However, high-throughput protein interaction data are often associated with high false positive and false negative rates. It is desirable to develop scalable methods to identify these errors. In this paper, we develop a computational method to identify spurious interactions and missing interactions from high-throughput protein interaction data. Our method uses both local and global topological information of protein pairs, and it assigns a local interacting score and a global interacting score to every protein pair. The local interacting score is calculated based on the common neighbors of the protein pairs. The global interacting score is computed using globally interacting protein group pairs. The two scores are then combined to obtain a final score called LGTweight to indicate the interacting possibility of two proteins. We tested our method on the DIP yeast interaction dataset. The experimental results show that the interactions ranked top by our method have higher functional homogeneity and localization coherence than existing methods, and our method also achieves higher sensitivity and precision under 5-fold cross validation than existing methods.

[1]  S. Oliver Proteomics: Guilt-by-association goes global , 2000, Nature.

[2]  Yoshihide Hayashizaki,et al.  Construction of reliable protein-protein interaction networks with a new interaction generality measure , 2003, Bioinform..

[3]  See-Kiong Ng,et al.  A correlated motif approach for finding short linear motifs from protein interaction networks , 2006, BMC Bioinformatics.

[4]  B. Snel,et al.  Comparative assessment of large-scale data sets of protein–protein interactions , 2002, Nature.

[5]  F. Cohen,et al.  Co-evolution of proteins with their interaction partners. , 2000, Journal of molecular biology.

[6]  David A. Gough,et al.  Predicting protein-protein interactions from primary structure , 2001, Bioinform..

[7]  Jinyan Li,et al.  Bioinformatics Original Paper Discovering Motif Pairs at Interaction Sites from Protein Sequences on a Proteome-wide Scale , 2022 .

[8]  J M Gauthier,et al.  Protein--protein interaction maps: a lead towards cellular functions. , 2001, Trends in genetics : TIG.

[9]  See-Kiong Ng,et al.  Integrative approach for computationally inferring protein domain interactions , 2003, SAC '03.

[10]  D. Eisenberg,et al.  Detecting protein function and protein-protein interactions from genome sequences. , 1999, Science.

[11]  Mong-Li Lee,et al.  Discovering reliable protein interactions from high-throughput experimental data using network topology , 2005, Artif. Intell. Medicine.

[12]  Mong-Li Lee,et al.  Increasing confidence of protein-protein interactomes. , 2006, Genome informatics. International Conference on Genome Informatics.

[13]  Limsoon Wong,et al.  An efficient strategy for extensive integration of diverse biological data for protein function prediction , 2007, Bioinform..

[14]  David Martin,et al.  Functional classification of proteins for the prediction of cellular function from a protein-protein interaction network , 2003, Genome Biology.

[15]  Mong-Li Lee,et al.  Increasing confidence of protein interactomes using network topological metrics , 2006, Bioinform..

[16]  Mark Gerstein,et al.  Bridging structural biology and genomics: assessing protein interaction data with known complexes. , 2002, Drug discovery today.

[17]  A. Valencia,et al.  Similarity of phylogenetic trees as indicator of protein-protein interaction. , 2001, Protein engineering.

[18]  Yoshihide Hayashizaki,et al.  Interaction generality, a measurement to assess the reliability of a protein-protein interaction. , 2002, Nucleic acids research.

[19]  Razvan C. Bunescu,et al.  Consolidating the set of known human protein-protein interactions in preparation for large-scale mapping of the human interactome , 2005, Genome Biology.

[20]  Dongsoo Han,et al.  A domain combination based probabilistic framework for protein-protein interaction prediction. , 2003, Genome informatics. International Conference on Genome Informatics.

[21]  Shmuel Sattath,et al.  How reliable are experimental protein-protein interaction data? , 2003, Journal of molecular biology.

[22]  Andrey Rzhetsky,et al.  Towards the Prediction of Complete Protein-Protein Interaction Networks , 2001, Pacific Symposium on Biocomputing.

[23]  Desmond J. Higham,et al.  A lock-and-key model for protein-protein interactions , 2006, Bioinform..

[24]  Mark Gerstein,et al.  Predicting interactions in protein networks by completing defective cliques , 2006, Bioinform..

[25]  Wan Kyu Kim,et al.  Large scale statistical prediction of protein-protein interaction by potentially interacting domain (PID) pair. , 2002, Genome informatics. International Conference on Genome Informatics.

[26]  Limsoon Wong,et al.  Exploiting Indirect Neighbours and Topological Weight to Predict Protein Function from Protein-Protein Interactions , 2006, BioDM.

[27]  B. Snel,et al.  Conservation of gene order: a fingerprint of proteins that physically interact. , 1998, Trends in biochemical sciences.

[28]  Hongjun Lu,et al.  Efficient Mining of Frequent Patterns Using Ascending Frequency Ordered Prefix-Tree , 2004, Data Mining and Knowledge Discovery.