PPIevo: protein-protein interaction prediction from PSSM based evolutionary information.

Protein-protein interactions regulate a variety of cellular processes. There is a great need for computational methods as a complement to experimental methods with which to predict protein interactions due to the existence of many limitations involved in experimental techniques. Here, we introduce a novel evolutionary based feature extraction algorithm for protein-protein interaction (PPI) prediction. The algorithm is called PPIevo and extracts the evolutionary feature from Position-Specific Scoring Matrix (PSSM) of protein with known sequence. The algorithm does not depend on the protein annotations, and the features are based on the evolutionary history of the proteins. This enables the algorithm to have more power for predicting protein-protein interaction than many sequence based algorithms. Results on the HPRD database show better performance and robustness of the proposed method. They also reveal that the negative dataset selection could lead to an acute performance overestimation which is the principal drawback of the available methods.

[1]  M. Gerstein,et al.  Assessing the limits of genomic data integration for predicting protein networks. , 2005, Genome research.

[2]  Ioannis Xenarios,et al.  DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions , 2002, Nucleic Acids Res..

[3]  Rafael C. Jimenez,et al.  The IntAct molecular interaction database in 2012 , 2011, Nucleic Acids Res..

[4]  Dongsoo Han,et al.  A domain combination based probabilistic framework for protein-protein interaction prediction. , 2003, Genome informatics. International Conference on Genome Informatics.

[5]  Mehmed Kantardzic,et al.  Data-Mining Concepts , 2011 .

[6]  Luonan Chen,et al.  Proteome-wide prediction of protein-protein interactions from high-throughput data , 2012, Protein & Cell.

[7]  Frederick P. Roth,et al.  Predicting co-complexed protein pairs using genomic and proteomic data integration , 2004, BMC Bioinformatics.

[8]  Jean Hausser,et al.  Entropy Inference and the James-Stein Estimator, with Application to Nonlinear Gene Association Networks , 2008, J. Mach. Learn. Res..

[9]  Tatsuya Akutsu,et al.  Rtips: fast and accurate tools for RNA 2D structure prediction using integer programming , 2012, Nucleic Acids Res..

[10]  Erkang Wang,et al.  Uncovering the rules for protein–protein interactions from yeast genomic data , 2009, Proceedings of the National Academy of Sciences.

[11]  Benjamin A. Shoemaker,et al.  Deciphering Protein–Protein Interactions. Part II. Computational Methods to Predict Protein and Domain Interaction Partners , 2007, PLoS Comput. Biol..

[12]  Xue-wen Chen,et al.  KUPS: constructing datasets of interacting and non-interacting protein pairs with associated attributions , 2010, Nucleic Acids Res..

[13]  Michael Schroeder,et al.  Large-scale De Novo Prediction of Physical Protein-Protein Association* , 2011, Molecular & Cellular Proteomics.

[14]  Yangchao Huang,et al.  Simple sequence-based kernels do not predict protein-protein interactions , 2010, Bioinform..

[15]  Juwen Shen,et al.  Predicting protein–protein interactions based only on sequences information , 2007, Proceedings of the National Academy of Sciences.

[16]  Mei Liu,et al.  Prediction of protein-protein interactions using random decision forest framework , 2005, Bioinform..

[17]  M. Gerstein,et al.  A Bayesian Networks Approach for Predicting Protein-Protein Interactions from Genomic Data , 2003, Science.

[18]  Athanasios K. Tsakalidis,et al.  Computational Approaches for the Prediction of Protein-Protein Interactions: A Survey , 2011 .

[19]  Alfonso Valencia,et al.  Progress and challenges in predicting protein-protein interaction sites , 2008, Briefings Bioinform..

[20]  Yu-Dong Cai,et al.  Prediction of Protein-Protein Interaction Sites by Random Forest Algorithm with mRMR and IFS , 2012, PloS one.

[21]  Anton J. Enright,et al.  Protein interaction maps for complete genomes based on gene fusion events , 1999, Nature.

[22]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[23]  Livia Perfetto,et al.  MINT, the molecular interaction database: 2012 update , 2011, Nucleic Acids Res..

[24]  Yu Zong Chen,et al.  prediction of protein-protein interactions , 2004 .

[25]  Teresa M. Przytycka,et al.  Predicting protein-protein interaction by searching evolutionary tree automorphism space , 2005, ISMB.

[26]  William Stafford Noble,et al.  Kernel methods for predicting protein-protein interactions , 2005, ISMB.

[27]  Korbinian Strimmer,et al.  Entropy Inference and the James-Stein Estimator, with Application to Nonlinear Gene Association Networks , 2008, J. Mach. Learn. Res..

[28]  Jaime G. Carbonell,et al.  Active learning for human protein-protein interaction prediction , 2010, BMC Bioinformatics.

[29]  T. Barrette,et al.  Probabilistic model of the human protein-protein interaction network , 2005, Nature Biotechnology.

[30]  B. Honig,et al.  Structure-based prediction of protein-protein interactions on a genome-wide scale , 2012, Nature.

[31]  Benno Schwikowski,et al.  Discovering regulatory and signalling circuits in molecular interaction networks , 2002, ISMB.

[32]  Yanjun Qi,et al.  Random Forest Similarity for Protein-Protein Interaction Prediction from Multiple Sources , 2004, Pacific Symposium on Biocomputing.

[33]  Guimei Liu,et al.  Assessing and predicting protein interactions using both local and global network topological metrics. , 2008 .

[34]  Ozlem Keskin,et al.  A survey of available tools and web servers for analysis of protein-protein interactions and interfaces , 2008, Briefings Bioinform..

[35]  Hany Alashwal,et al.  Protein-Protein Interaction Detection Based on Substring Sensitivity Measure , 2007 .

[36]  Kara Dolinski,et al.  The BioGRID Interaction Database: 2011 update , 2010, Nucleic Acids Res..

[37]  Pierre Geurts,et al.  Supervised learning with decision tree-based methods in computational and systems biology. , 2009, Molecular bioSystems.

[38]  Julie M. Sahalie,et al.  An experimentally derived confidence score for binary protein-protein interactions , 2008, Nature Methods.

[39]  William Stafford Noble,et al.  Choosing negative examples for the prediction of protein-protein interactions , 2006, BMC Bioinformatics.

[40]  Dietrich Rebholz-Schuhmann,et al.  Integrating protein-protein interactions and text mining for protein function prediction , 2008, BMC Bioinformatics.

[41]  Edward Keedwell,et al.  Discovering Gene Networks with a Neural-Genetic Hybrid , 2005, TCBB.

[42]  Oksam Chae,et al.  Selecting Negative Examples for Protein-Protein Interaction , 2009 .

[43]  L. Holm,et al.  The Pfam protein families database , 2005, Nucleic Acids Res..

[44]  Xiang Chen,et al.  The use of classification trees for bioinformatics , 2011, WIREs Data Mining Knowl. Discov..

[45]  Dmitrij Frishman,et al.  The Negatome database: a reference set of non-interacting protein pairs , 2009, Nucleic Acids Res..

[46]  Hong-Bin Shen,et al.  Adaptive compressive learning for prediction of protein-protein interactions from primary sequence. , 2011, Journal of theoretical biology.

[47]  Peer Bork,et al.  SMART 7: recent updates to the protein domain annotation resource , 2011, Nucleic Acids Res..

[48]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[49]  Ashok N. Srivastava,et al.  Data Mining: Concepts, Models, Methods, and Algorithms , 2005, J. Comput. Inf. Sci. Eng..

[50]  Sandhya Rani,et al.  Human Protein Reference Database—2009 update , 2008, Nucleic Acids Res..

[51]  Huiru Zheng,et al.  GRIP: A web-based system for constructing Gold Standard datasets for protein-protein interaction prediction , 2008, Source Code for Biology and Medicine.