A novel link prediction algorithm for reconstructing protein-protein interaction networks by topological similarity

MOTIVATION Recent advances in technology have dramatically increased the availability of protein-protein interaction (PPI) data and stimulated the development of many methods for improving the systems level understanding the cell. However, those efforts have been significantly hindered by the high level of noise, sparseness and highly skewed degree distribution of PPI networks. Here, we present a novel algorithm to reduce the noise present in PPI networks. The key idea of our algorithm is that two proteins sharing some higher-order topological similarities, measured by a novel random walk-based procedure, are likely interacting with each other and may belong to the same protein complex. RESULTS Applying our algorithm to a yeast PPI network, we found that the edges in the reconstructed network have higher biological relevance than in the original network, assessed by multiple types of information, including gene ontology, gene expression, essentiality, conservation between species and known protein complexes. Comparison with existing methods shows that the network reconstructed by our method has the highest quality. Using two independent graph clustering algorithms, we found that the reconstructed network has resulted in significantly improved prediction accuracy of protein complexes. Furthermore, our method is applicable to PPI networks obtained with different experimental systems, such as affinity purification, yeast two-hybrid (Y2H) and protein-fragment complementation assay (PCA), and evidence shows that the predicted edges are likely bona fide physical interactions. Finally, an application to a human PPI network increased the coverage of the network by at least 100%. AVAILABILITY www.cs.utsa.edu/∼jruan/RWS/.

[1]  Linyuan Lu,et al.  Link Prediction in Complex Networks: A Survey , 2010, ArXiv.

[2]  R. Sharan,et al.  Protein networks in disease. , 2008, Genome research.

[3]  Sandhya Rani,et al.  Human Protein Reference Database—2009 update , 2008, Nucleic Acids Res..

[4]  Lan V. Zhang,et al.  Evidence for dynamically organized modularity in the yeast protein–protein interaction network , 2004, Nature.

[5]  Joel S. Bader,et al.  Where Have All the Interactions Gone? Estimating the Coverage of Two-Hybrid Protein Interaction Maps , 2007, PLoS Comput. Biol..

[6]  Kara Dolinski,et al.  Saccharomyces genome database: Underlying principles and organisation , 2004, Briefings Bioinform..

[7]  D. Botstein,et al.  Genomic expression programs in the response of yeast cells to environmental changes. , 2000, Molecular biology of the cell.

[8]  Jianhua Ruan,et al.  A Fully Automated Method for Discovering Community Structures in High Dimensional Data , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[9]  Jacques van Helden,et al.  Evaluation of clustering algorithms for protein-protein interaction networks , 2006, BMC Bioinformatics.

[10]  Caroline C. Friedel,et al.  Bootstrapping the Interactome: Unsupervised Identification of Protein Complexes in Yeast , 2008, RECOMB.

[11]  Marina Meila,et al.  Comparing clusterings: an axiomatic view , 2005, ICML.

[12]  T. Ideker,et al.  Network-based classification of breast cancer metastasis , 2007, Molecular systems biology.

[13]  Philip S. Yu,et al.  A new method to measure the semantic similarity of GO terms , 2007, Bioinform..

[14]  Francis D. Gibbons,et al.  Predicting protein complex membership using probabilistic network reliability. , 2004, Genome research.

[15]  Limsoon Wong,et al.  Exploiting Indirect Neighbours and Topological Weight to Predict Protein Function from Protein-Protein Interactions , 2006, BioDM.

[16]  François Fouss,et al.  Random-Walk Computation of Similarities between Nodes of a Graph with Application to Collaborative Recommendation , 2007, IEEE Transactions on Knowledge and Data Engineering.

[17]  Teresa M. Przytycka,et al.  Identifying Causal Genes and Dysregulated Pathways in Complex Diseases , 2011, PLoS Comput. Biol..

[18]  Desmond J. Higham,et al.  Geometric De-noising of Protein-Protein Interaction Networks , 2009, PLoS Comput. Biol..

[19]  Kara Dolinski,et al.  The BioGRID Interaction Database: 2011 update , 2010, Nucleic Acids Res..

[20]  Shoshana J. Wodak,et al.  Markov clustering versus affinity propagation for the partitioning of protein interaction graphs , 2009, BMC Bioinformatics.

[21]  Mark Gerstein,et al.  The Importance of Bottlenecks in Protein Networks: Correlation with Gene Essentiality and Expression Dynamics , 2007, PLoS Comput. Biol..

[22]  C. Landry,et al.  An in Vivo Map of the Yeast Protein Interactome , 2008, Science.

[23]  Chris Ding,et al.  which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. , 2007 .

[24]  R. Sharan,et al.  Network-based prediction of protein function , 2007, Molecular systems biology.

[25]  Weixiong Zhang,et al.  Identification and Evaluation of Weak Community Structures in Networks , 2006, AAAI.

[26]  Sean R. Collins,et al.  Global landscape of protein complexes in the yeast Saccharomyces cerevisiae , 2006, Nature.

[27]  P. Bork,et al.  Proteome survey reveals modularity of the yeast cell machinery , 2006, Nature.

[28]  Albert-László Barabási,et al.  A Dynamic Network Approach for the Study of Human Phenotypes , 2009, PLoS Comput. Biol..

[29]  Ron Shamir,et al.  Identifying functional modules using expression profiles and confidence-scored protein interactions , 2009, Bioinform..

[30]  A. Barabasi,et al.  High-Quality Binary Protein Interaction Map of the Yeast Interactome Network , 2008, Science.

[31]  Richard M. Karp,et al.  Genome-Wide Association Data Reveal a Global Map of Genetic Interactions among Protein Complexes , 2009, PLoS genetics.

[32]  David L. Wallace,et al.  A Method for Comparing Two Hierarchical Clusterings: Comment , 1983 .

[33]  Yibo Wu,et al.  GOSemSim: an R package for measuring semantic similarity among GO terms and gene products , 2010, Bioinform..

[34]  László Lovász,et al.  Random Walks on Graphs: A Survey , 1993 .

[35]  Trey Ideker,et al.  Protein networks markedly improve prediction of subcellular localization in multiple eukaryotic species , 2008, Nucleic acids research.

[36]  FoussFrancois,et al.  Random-Walk Computation of Similarities between Nodes of a Graph with Application to Collaborative Recommendation , 2007 .

[37]  C. Mallows,et al.  A Method for Comparing Two Hierarchical Clusterings , 1983 .

[38]  Dmitrij Frishman,et al.  MIPS: analysis and annotation of proteins from whole genomes in 2005 , 2006, Nucleic Acids Res..

[39]  Igor Jurisica,et al.  Protein complex prediction via cost-based clustering , 2004, Bioinform..

[40]  Gary D Bader,et al.  Analyzing yeast protein–protein interaction data obtained from different sources , 2002, Nature Biotechnology.

[41]  Steve Horvath,et al.  Network neighborhood analysis with the multi-node topological overlap measure , 2007, Bioinform..

[42]  Karthik Ramani,et al.  Global Geometric Affinity for Revealing High Fidelity Protein Interaction Network , 2011, PloS one.

[43]  Wei-Chung Cheng,et al.  Microarray meta-analysis database (M2DB): a uniformly pre-processed, quality controlled, and manually curated human clinical microarray database , 2010, BMC Bioinformatics.

[44]  Weixiong Zhang,et al.  Identifying network communities with a high resolution. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[45]  Christoph H Borchers,et al.  Mnd2 and Swm1 Are Core Subunits of the Saccharomyces cerevisiae Anaphase-promoting Complex* , 2003, The Journal of Biological Chemistry.

[46]  A. Barabasi,et al.  Lethality and centrality in protein networks , 2001, Nature.

[47]  Jennifer M. Rust,et al.  The BioGRID Interaction Database , 2011 .

[48]  Lee Aaron Newberg,et al.  Exact Calculation of Distributions on Integers, with Application to Sequence Alignment , 2009, J. Comput. Biol..

[49]  Jianhua Ruan,et al.  Building and analyzing protein interactome networks by cross-species comparisons , 2010, BMC Systems Biology.

[50]  Youping Deng,et al.  Recent advances in clustering methods for protein interaction networks , 2010, BMC Genomics.

[51]  Claudio Castellano,et al.  Defining and identifying communities in networks. , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[52]  Nataša Pržulj,et al.  Protein‐protein interactions: Making sense of networks via graph‐theoretic modeling , 2011, BioEssays : news and reviews in molecular, cellular and developmental biology.

[53]  Christos Faloutsos,et al.  Fast Random Walk with Restart and Its Applications , 2006, Sixth International Conference on Data Mining (ICDM'06).