Assessing and predicting protein interactions by combining manifold embedding with multiple information integration

BackgroundProtein-protein interactions (PPIs) play crucial roles in virtually every aspect of cellular function within an organism. Over the last decade, the development of novel high-throughput techniques has resulted in enormous amounts of data and provided valuable resources for studying protein interactions. However, these high-throughput protein interaction data are often associated with high false positive and false negative rates. It is therefore highly desirable to develop scalable methods to identify these errors from the computational perspective.ResultsWe have developed a robust computational technique for assessing the reliability of interactions and predicting new interactions by combining manifold embedding with multiple information integration. Validation of the proposed method was performed with extensive experiments on densely-connected and sparse PPI networks of yeast respectively. Results demonstrate that the interactions ranked top by our method have high functional homogeneity and localization coherence.ConclusionsOur proposed method achieves better performances than the existing methods no matter assessing or predicting protein interactions. Furthermore, our method is general enough to work over a variety of PPI networks irrespectively of densely-connected or sparse PPI network. Therefore, the proposed algorithm is a much more promising method to detect both false positive and false negative interactions in PPI networks.

[1]  Yoshihide Hayashizaki,et al.  Construction of reliable protein-protein interaction networks with a new interaction generality measure , 2003, Bioinform..

[2]  Joshua B. Tenenbaum,et al.  Global Versus Local Methods in Nonlinear Dimensionality Reduction , 2002, NIPS.

[3]  Hongbin Zha,et al.  Riemannian Manifold Learning , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Igor Jurisica,et al.  Efficient estimation of graphlet frequency distributions in protein-protein interaction networks , 2006, Bioinform..

[5]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[6]  Kilian Q. Weinberger,et al.  Unsupervised Learning of Image Manifolds by Semidefinite Programming , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[7]  Hongyuan Zha,et al.  Principal Manifolds and Nonlinear Dimension Reduction via Local Tangent Space Alignment , 2002, ArXiv.

[8]  Mong-Li Lee,et al.  Discovering reliable protein interactions from high-throughput experimental data using network topology , 2005, Artif. Intell. Medicine.

[9]  J. Blake,et al.  Creating the Gene Ontology Resource : Design and Implementation The Gene Ontology Consortium 2 , 2001 .

[10]  Martin Ester,et al.  Dense Graphlet Statistics of Protein Interaction and Random Networks , 2009, Pacific Symposium on Biocomputing.

[11]  Guimei Liu,et al.  Protein Interactome Analysis for Countering Pathogen Drug Resistance , 2010, Journal of Computer Science and Technology.

[12]  H. Zha,et al.  Principal manifolds and nonlinear dimensionality reduction via tangent space alignment , 2004, SIAM J. Sci. Comput..

[13]  B. Nadler,et al.  Diffusion maps, spectral clustering and reaction coordinates of dynamical systems , 2005, math/0503445.

[14]  Richard M. Karp,et al.  Reducibility Among Combinatorial Problems , 1972, 50 Years of Integer Programming.

[15]  Mario Medvedovic,et al.  LRpath: a logistic regression approach for identifying enriched biological groups in gene expression data , 2009, Bioinform..

[16]  David Martin,et al.  Functional classification of proteins for the prediction of cellular function from a protein-protein interaction network , 2003, Genome Biology.

[17]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[18]  Desmond J. Higham,et al.  Geometric De-noising of Protein-Protein Interaction Networks , 2009, PLoS Comput. Biol..

[19]  Stéphane Lafon,et al.  Diffusion maps , 2006 .

[20]  Igor Jurisica,et al.  Modeling interactome: scale-free or geometric? , 2004, Bioinform..

[21]  Nicola J. Rinaldi,et al.  Transcriptional regulatory code of a eukaryotic genome , 2004, Nature.

[22]  Mark Gerstein,et al.  Information assessment on predicting protein-protein interactions , 2004, BMC Bioinformatics.

[23]  Limsoon Wong,et al.  Author's Personal Copy Increasing the Reliability of Protein Interactomes , 2022 .

[24]  Pierre Legrain,et al.  Biochemical Characterization of Protein Complexes from the Helicobacter pylori Protein Interaction Map , 2004, Molecular & Cellular Proteomics.

[25]  Joshua B. Tenenbaum,et al.  Sparse multidimensional scaling using land-mark points , 2004 .

[26]  Desmond J. Higham,et al.  Fitting a geometric graph to a protein-protein interaction network , 2008, Bioinform..

[27]  Yanjun Qi,et al.  Random Forest Similarity for Protein-Protein Interaction Prediction from Multiple Sources , 2004, Pacific Symposium on Biocomputing.

[28]  Mark Gerstein,et al.  Bridging structural biology and genomics: assessing protein interaction data with known complexes. , 2002, Drug discovery today.

[29]  Matthew Brand,et al.  Charting a Manifold , 2002, NIPS.

[30]  Geoffrey J. Barton,et al.  Probabilistic prediction and ranking of human protein-protein interactions , 2007, BMC Bioinformatics.

[31]  M. Gerstein,et al.  Relating whole-genome expression data with protein-protein interactions. , 2002, Genome research.

[32]  J. Rothberg,et al.  Gaining confidence in high-throughput protein interaction networks , 2004, Nature Biotechnology.

[33]  Gary D Bader,et al.  Global Mapping of the Yeast Genetic Interaction Network , 2004, Science.

[34]  Kara Dolinski,et al.  Saccharomyces Genome Database (SGD) provides tools to identify and analyze sequences from Saccharomyces cerevisiae and related sequences from other organisms , 2004, Nucleic Acids Res..

[35]  Xiang Zhang,et al.  Radial basis function neural network ensemble for predicting protein-protein interaction sites in heterocomplexes. , 2010, Protein and peptide letters.

[36]  D. Donoho,et al.  Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[37]  Ioannis Xenarios,et al.  DIP: the Database of Interacting Proteins , 2000, Nucleic Acids Res..

[38]  Toshihisa Takagi,et al.  Prediction of protein-protein interactions using support vector machines , 2004, Proceedings. Fourth IEEE Symposium on Bioinformatics and Bioengineering.

[39]  Giorgio Gallo,et al.  Shortest path algorithms , 1988, Handbook of Optimization in Telecommunications.

[40]  Natasa Przulj,et al.  Biological network comparison using graphlet degree distribution , 2007, Bioinform..

[41]  M. Gerstein,et al.  A Bayesian Networks Approach for Predicting Protein-Protein Interactions from Genomic Data , 2003, Science.

[42]  Raymond E. Miller,et al.  Complexity of Computer Computations , 1972 .

[43]  Lawrence K. Saul,et al.  Think Globally, Fit Locally: Unsupervised Learning of Low Dimensional Manifold , 2003, J. Mach. Learn. Res..

[44]  Yoshihide Hayashizaki,et al.  Interaction generality, a measurement to assess the reliability of a protein-protein interaction. , 2002, Nucleic acids research.

[45]  Sean R. Collins,et al.  Toward a Comprehensive Atlas of the Physical Interactome of Saccharomyces cerevisiae*S , 2007, Molecular & Cellular Proteomics.

[46]  I. Hassan Embedded , 2005, The Cyber Security Handbook.

[47]  Ziv Bar-Joseph,et al.  Evaluation of different biological data and computational classification methods for use in protein interaction prediction , 2006, Proteins.

[48]  Feiping Nie,et al.  Nonlinear Dimensionality Reduction with Local Spline Embedding , 2009, IEEE Transactions on Knowledge and Data Engineering.

[49]  Ann B. Lee,et al.  Diffusion maps and coarse-graining: a unified framework for dimensionality reduction, graph partitioning, and data set parameterization , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[50]  Ziv Bar-Joseph,et al.  A mixture of feature experts approach for protein-protein interaction prediction , 2007, BMC Bioinformatics.

[51]  Mong-Li Lee,et al.  Increasing confidence of protein interactomes using network topological metrics , 2006, Bioinform..

[52]  Michael C. Hout,et al.  Multidimensional Scaling , 2003, Encyclopedic Dictionary of Archaeology.

[53]  Sean R. Collins,et al.  Global landscape of protein complexes in the yeast Saccharomyces cerevisiae , 2006, Nature.

[54]  D. Donoho,et al.  Hessian Eigenmaps : new locally linear embedding techniques for high-dimensional data , 2003 .

[55]  P. Bork,et al.  Proteome survey reveals modularity of the yeast cell machinery , 2006, Nature.

[56]  Alain Guénoche,et al.  Two local dissimilarity measures for weighted graphs with application to protein interaction networks , 2008, Adv. Data Anal. Classif..

[57]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[58]  Hongbin Zha,et al.  Riemannian Manifold Learning for Nonlinear Dimensionality Reduction , 2006, ECCV.

[59]  Natasa Przulj,et al.  Modelling protein–protein interaction networks via a stickiness index , 2006, Journal of The Royal Society Interface.

[60]  Kara Dolinski,et al.  The BioGRID Interaction Database: 2008 update , 2008, Nucleic Acids Res..

[61]  Zhu-Hong You,et al.  Using manifold embedding for assessing and predicting protein interactions from high-throughput experimental data , 2010, Bioinform..

[62]  Peng Chen,et al.  Predicting protein interaction sites from residue spatial sequence profile and evolution rate , 2006, FEBS Letters.

[63]  Paola Festa,et al.  Shortest Path Algorithms , 2006, Handbook of Optimization in Telecommunications.

[64]  B. Wang,et al.  Inferring protein-protein interacting sites using residue conservation and evolutionary information. , 2006, Protein and peptide letters.

[65]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[66]  Nicola J. Rinaldi,et al.  Computational discovery of gene modules and regulatory networks , 2003, Nature Biotechnology.

[67]  Limsoon Wong,et al.  Exploiting Indirect Neighbours and Topological Weight to Predict Protein Function from Protein-Protein Interactions , 2006, BioDM.