Evaluation of Machine Learning Algorithms on Protein-Protein Interactions

Protein-protein interactions are important for the majority of biological processes. A significant number of computational methods have been developed to predict protein-protein interactions using proteins’ sequence, structural and genomic data. Hence, this fact motivated us to perform a comparative study of various machine learning methods, training them on the set of known protein-protein interactions, using proteins’ global and local attributes. The results of the classifiers were evaluated through cross-validation and several performance measures were computed. It was noticed from the results that support vector machine outperformed other classifiers. This fact has also been established through statistical test, called Wilcoxon rank sum test, at 5% significance level.

[1]  Adam J. Smith,et al.  The Database of Interacting Proteins: 2004 update , 2004, Nucleic Acids Res..

[2]  Dariusz Plewczynski,et al.  Protein-protein interaction and pathway databases, a graphical review , 2011, Briefings Bioinform..

[3]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[4]  Frederick P. Roth,et al.  Predicting co-complexed protein pairs using genomic and proteomic data integration , 2004, BMC Bioinformatics.

[5]  Y. Zhang,et al.  IntAct—open source resource for molecular interaction data , 2006, Nucleic Acids Res..

[6]  C. Deane,et al.  Protein Interactions , 2002, Molecular & Cellular Proteomics.

[7]  E. van Nimwegen,et al.  Accurate Prediction of Protein–protein Interactions from Sequence Alignments Using a Bayesian Method , 2022 .

[8]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[9]  Ujjwal Maulik,et al.  Improvement of new automatic differential fuzzy clustering using SVM classifier for microarray analysis , 2011, Expert Syst. Appl..

[10]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[11]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[12]  Pat Langley,et al.  Estimating Continuous Distributions in Bayesian Classifiers , 1995, UAI.

[13]  Xiaowei Zhao,et al.  Predicting protein-protein interactions by combing various sequence- derived features into the general form of Chou's Pseudo amino acid composition. , 2012, Protein and peptide letters.

[14]  Vijaykumar Yogesh Muley Improved computational prediction and analysis of protein - protein interaction networks , 2012 .

[15]  Nai-Yang Deng,et al.  Sequence-based protein-protein interaction prediction via support vector machine , 2010, J. Syst. Sci. Complex..

[16]  Tom Fawcett,et al.  Robust Classification for Imprecise Environments , 2000, Machine Learning.

[17]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[18]  David J. C. MacKay,et al.  The Evidence Framework Applied to Classification Networks , 1992, Neural Computation.

[19]  Yaqiu Liu,et al.  SVM-based prediction of protein-protein interactions of Glucosinolate biosynthesis , 2012, 2012 International Conference on Machine Learning and Cybernetics.

[20]  José Antonio Reyes,et al.  Machine learning for the prediction of protein-protein interactions , 2010 .

[21]  Maria Victoria Schneider,et al.  MINT: a Molecular INTeraction database. , 2002, FEBS letters.

[22]  M. Shaw,et al.  Induction of fuzzy decision trees , 1995 .

[23]  Christie S. Chang,et al.  The BioGRID interaction database: 2013 update , 2012, Nucleic Acids Res..

[24]  Dariusz Plewczynski,et al.  AutoMotif Server for prediction of phosphorylation sites in proteins using support vector machine: 2007 update , 2008, Journal of molecular modeling.

[25]  Yibo Wu,et al.  GOSemSim: an R package for measuring semantic similarity among GO terms and gene products , 2010, Bioinform..

[26]  C. Borror Nonparametric Statistical Methods, 2nd, Ed. , 2001 .

[27]  Sailu Yellaboina,et al.  DOMINE: a comprehensive collection of known and predicted domain-domain interactions , 2010, Nucleic Acids Res..

[28]  Kara Dolinski,et al.  The BioGRID Interaction Database: 2008 update , 2008, Nucleic Acids Res..

[29]  Dariusz Plewczynski,et al.  Consensus classification of human leukocyte antigen class II proteins , 2012, Immunogenetics.