Predicting Protein-Protein Interactions with K-Nearest Neighbors Classification Algorithm

In this work we address the problem of predicting protein-protein interactions. Its solution can give greater insight in the study of complex diseases, like cancer, and provides valuable information in the study of active small molecules for new drugs, limiting the number of molecules to be tested in laboratory. We model the problem as a binary classification task, using a suitable coding of the amino acid sequences. We apply k-Nearest Neighbors classification algorithm to the classes of interacting and noninteracting proteins. Results show that it is possible to achieve high prediction accuracy in cross validation. A case study is analyzed to show it is possible to reconstruct a real network of thousands interacting proteins with high accuracy on standard hardware.

[1]  J. Thornton,et al.  Diversity of protein–protein interactions , 2003, The EMBO journal.

[2]  Gabriel Waksman,et al.  Proteomics and protein-protein interactions : biology, chemistry, bionformatics, and drug design , 2005 .

[3]  David A. Gough,et al.  Predicting protein-protein interactions from primary structure , 2001, Bioinform..

[4]  Javier De Las Rivas,et al.  Interactome data and databases: different types of protein interaction: Conference Reviews , 2004 .

[6]  Maria Victoria Schneider,et al.  MINT: a Molecular INTeraction database. , 2002, FEBS letters.

[7]  Vladimir Vapnik,et al.  The Nature of Statistical Learning , 1995 .

[8]  Benjamin A. Shoemaker,et al.  Deciphering Protein–Protein Interactions. Part II. Computational Methods to Predict Protein and Domain Interaction Partners , 2007, PLoS Comput. Biol..

[9]  Kuo-Chen Chou,et al.  Computational methods for protein-protein interaction and their application. , 2005, Current protein & peptide science.

[10]  Panos M. Pardalos,et al.  Current classification algorithms for biomedical applications , 2008 .

[11]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[12]  A. Walker-Taylor,et al.  Computational Methods for Predicting Protein-Protein Interactions , 2005 .

[13]  De-Shuang Huang,et al.  Predicting protein–protein interactions from sequence using correlation coefficient and high-quality interaction dataset , 2010, Amino Acids.

[14]  Loris Nanni,et al.  Hyperplanes for predicting protein-protein interactions , 2005, Neurocomputing.

[15]  William Stafford Noble,et al.  Choosing negative examples for the prediction of protein-protein interactions , 2006, BMC Bioinformatics.

[16]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[17]  Sandhya Rani,et al.  Human Protein Reference Database—2009 update , 2008, Nucleic Acids Res..

[18]  Angelo M Facchiano,et al.  Prediction of the protein structural class by specific peptide frequencies. , 2009, Biochimie.

[19]  Dmitrij Frishman,et al.  MIPS: analysis and annotation of genome information in 2007 , 2007, Nucleic Acids Res..

[20]  Igor Jurisica,et al.  Online Predicted Human Interaction Database , 2005, Bioinform..

[21]  Mike Tyers,et al.  BioGRID: a general repository for interaction datasets , 2005, Nucleic Acids Res..

[22]  A. Grigoriev On the number of protein-protein interactions in the yeast proteome. , 2003, Nucleic acids research.

[23]  John Platt,et al.  Fast training of svms using sequential minimal optimization , 1998 .

[24]  R. Chettier,et al.  A Human Protein Interaction Network Shows Conservation of Aging Processes between Human and Invertebrate Species , 2009, PLoS genetics.

[25]  Kumaran Kandasamy,et al.  An evaluation of human protein-protein interaction data in the public domain , 2006, BMC Bioinformatics.

[26]  Ioannis Xenarios,et al.  DIP: the Database of Interacting Proteins , 2000, Nucleic Acids Res..

[27]  Juwen Shen,et al.  Predicting protein–protein interactions based only on sequences information , 2007, Proceedings of the National Academy of Sciences.

[28]  Javier De Las Rivas,et al.  Interactome data and databases: different types of protein interaction , 2004 .