A two-stage geometric method for detecting unreliable links in protein-protein networks

Protein-protein interactions (PPIs) play crucial roles in the execution of various cellular processes and form the basis of biological mechanisms. Although large amount of PPIs data for different species has been generated by high-throughput experimental techniques, the data produced by these techniques have high levels of spurious interactions. Hence, it is of great practical significance to develop reliable computational methods to facilitate the identification of PPIs. In this paper, we propose a new geometric approach called Leave-One-Out Logistic Metric Embedding (LOO-LME) for assessing the reliability of interactions. Unlike previous approaches which mainly seek to preserve the noisy topological information of the PPI networks in the embedding space, LOO-LME first transforms the learning task into an equivalent discriminant form, then directly deals with the uncertainty in PPI networks using a leave-one-out-style approach. The experimental results show that LOO-LME substantially outperforms previous methods on PPI assessment problems. LOO-LME could thus facilitate further graph-based studies of PPIs and may help infer their hidden underlying biological knowledge.

[1]  Philip Resnik,et al.  Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language , 1999, J. Artif. Intell. Res..

[2]  Zhu-Hong You,et al.  Increasing the reliability of protein-protein interaction networks via non-convex semantic embedding , 2013, Neurocomputing.

[3]  Peter D. Hoff,et al.  Fast Inference for the Latent Space Network Model Using a Case-Control Approximate Likelihood , 2012, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[4]  Zhu-Hong You,et al.  Prediction of protein-protein interactions from amino acid sequences with ensemble extreme learning machines and principal component analysis , 2013, BMC Bioinformatics.

[5]  Yoshihide Hayashizaki,et al.  Construction of reliable protein-protein interaction networks with a new interaction generality measure , 2003, Bioinform..

[6]  James R. Knight,et al.  A Protein Interaction Map of Drosophila melanogaster , 2003, Science.

[7]  Robert P. St.Onge,et al.  Defining genetic interaction , 2008, Proceedings of the National Academy of Sciences.

[8]  P. Bork,et al.  Functional organization of the yeast proteome by systematic analysis of protein complexes , 2002, Nature.

[9]  Tijana Milenkovic,et al.  GraphCrunch: A tool for large network analyses , 2008, BMC Bioinformatics.

[10]  François Fouss,et al.  Random-Walk Computation of Similarities between Nodes of a Graph with Application to Collaborative Recommendation , 2007, IEEE Transactions on Knowledge and Data Engineering.

[11]  Jianhua Ruan,et al.  A novel link prediction algorithm for reconstructing protein-protein interaction networks by topological similarity , 2013, Bioinform..

[12]  R. Karp,et al.  From the Cover : Conserved patterns of protein interaction in multiple species , 2005 .

[13]  Catia Pesquita,et al.  Metrics for GO based protein semantic similarity: a systematic evaluation , 2008, BMC Bioinformatics.

[14]  Gary D Bader,et al.  Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry , 2002, Nature.

[15]  Zhu-Hong You,et al.  t-LSE: A Novel Robust Geometric Approach for Modeling Protein-Protein Interaction Networks , 2013, PloS one.

[16]  R. Ozawa,et al.  A comprehensive two-hybrid analysis to explore the yeast protein interactome , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[17]  Javier De Las Rivas,et al.  Protein–Protein Interactions Essentials: Key Concepts to Building and Analyzing Interactome Networks , 2010, PLoS Comput. Biol..

[18]  Réka Albert,et al.  Conserved network motifs allow protein-protein interaction prediction , 2004, Bioinform..

[19]  Purnamrita Sarkar,et al.  Theoretical Justification of Popular Link Prediction Heuristics , 2011, IJCAI.

[20]  Farid Alizadeh,et al.  Interior Point Methods in Semidefinite Programming with Applications to Combinatorial Optimization , 1995, SIAM J. Optim..

[21]  Limsoon Wong,et al.  Author's Personal Copy Increasing the Reliability of Protein Interactomes , 2022 .

[22]  Limsoon Wong,et al.  Exploiting Indirect Neighbours and Topological Weight to Predict Protein Function from Protein-Protein Interactions , 2006, BioDM.

[23]  Ryan W. Solava,et al.  Revealing Missing Parts of the Interactome via Link Prediction , 2014, PloS one.

[24]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[25]  Yoshihide Hayashizaki,et al.  Interaction Generality, a Measurement to Assess the Reliability of a Protein-Protein Interaction , 2002 .

[26]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[27]  Dao-Qing Dai,et al.  Identifying Spurious Interactions and Predicting Missing Interactions in the Protein-Protein Interaction Networks via a Generative Network Model , 2013, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[28]  Sean R. Collins,et al.  Toward a Comprehensive Atlas of the Physical Interactome of Saccharomyces cerevisiae*S , 2007, Molecular & Cellular Proteomics.

[29]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[30]  Jonathan D. G. Jones,et al.  Evidence for Network Evolution in an Arabidopsis Interactome Map , 2011, Science.

[31]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[32]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[33]  Desmond J. Higham,et al.  Geometric De-noising of Protein-Protein Interaction Networks , 2009, PLoS Comput. Biol..

[34]  Philip S. Yu,et al.  A new method to measure the semantic similarity of GO terms , 2007, Bioinform..

[35]  Karthik Ramani,et al.  Global Geometric Affinity for Revealing High Fidelity Protein Interaction Network , 2011, PloS one.

[36]  David Martin,et al.  Functional classification of proteins for the prediction of cellular function from a protein-protein interaction network , 2003, Genome Biology.

[37]  Zhu-Hong You,et al.  Using manifold embedding for assessing and predicting protein interactions from high-throughput experimental data , 2010, Bioinform..

[38]  Ioannis Xenarios,et al.  DIP: The Database of Interacting Proteins: 2001 update , 2001, Nucleic Acids Res..

[39]  Gary D Bader,et al.  A Combined Experimental and Computational Strategy to Define Protein Interaction Networks for Peptide Recognition Modules , 2001, Science.

[40]  Claudio Castellano,et al.  Defining and identifying communities in networks. , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[41]  Desmond J. Higham,et al.  Fitting a geometric graph to a protein-protein interaction network , 2008, Bioinform..

[42]  Mark Gerstein,et al.  Bridging structural biology and genomics: assessing protein interaction data with known complexes. , 2002, Drug discovery today.

[43]  De-Shuang Huang,et al.  Efficient optimally regularized discriminant analysis , 2013, Neurocomputing.

[44]  Guimei Liu,et al.  Assessing and predicting protein interactions using both local and global network topological metrics. , 2008 .

[45]  Mong-Li Lee,et al.  Discovering reliable protein interactions from high-throughput experimental data using network topology , 2005, Artif. Intell. Medicine.

[46]  Nagarajan Natarajan,et al.  Inductive matrix completion for predicting gene–disease associations , 2014, Bioinform..

[47]  J. Jiang,et al.  Multi-word complex concept retrieval via lexical semantic similarity , 1999, Proceedings 1999 International Conference on Information Intelligence and Systems (Cat. No.PR00446).

[48]  Desmond J. Higham,et al.  A lock-and-key model for protein-protein interactions , 2006, Bioinform..

[49]  Sean R. Collins,et al.  Global landscape of protein complexes in the yeast Saccharomyces cerevisiae , 2006, Nature.

[50]  James R. Knight,et al.  A comprehensive analysis of protein–protein interactions in Saccharomyces cerevisiae , 2000, Nature.

[51]  P. Bork,et al.  Proteome survey reveals modularity of the yeast cell machinery , 2006, Nature.

[52]  Arun K. Ramani,et al.  How complete are current yeast and human protein-interaction networks? , 2006, Genome Biology.

[53]  Jure Leskovec,et al.  Inferring networks of diffusion and influence , 2010, KDD.

[54]  Haixuan Yang,et al.  Improving GO semantic similarity measures by exploring the ontology beneath the terms and modelling uncertainty , 2012, Bioinform..

[55]  Carlo Vittorio Cannistraci,et al.  Minimum curvilinearity to enhance topological prediction of protein interactions by network embedding , 2013, Bioinform..

[56]  Igor Jurisica,et al.  Modeling interactome: scale-free or geometric? , 2004, Bioinform..

[57]  Peter D. Hoff,et al.  Latent Space Approaches to Social Network Analysis , 2002 .