Machine learning based protein-protein interaction prediction using physical-chemical representations

Many proteins can interact with other proteins to perform specific functions. Predicting those interactions is important in order to analyze signaling pathways or to define the influence of a specific protein in some diseases. This work proposes the implementation of Support Vector Machines (SVM) for the prediction of protein-protein interactions using physical-chemical features taken from AA index. This algorithm was trained with a set of over 10.000 positive interactions from DIP database, and the same number of negative interactions through random permutations. The obtained results demonstrate that these features can provide useful information for the training set in order to improve the quality of the classification. Additionally, tunning the parameters of the SVM with Particle Swarm Optimization, lead to significantly improve the performance of the machine (greater than 70%), in comparison to recent studies.

[1]  A. Valencia,et al.  Computational methods for the prediction of protein interactions. , 2002, Current opinion in structural biology.

[2]  Riccardo Poli,et al.  Particle swarm optimization , 1995, Swarm Intelligence.

[3]  Kurt Hornik,et al.  Kernel-Based Machine Learning Lab , 2016 .

[4]  Shao-Wu Zhang,et al.  Prediction of Protein–Protein Interaction with Pairwise Kernel Support Vector Machine , 2014, International journal of molecular sciences.

[5]  Minoru Kanehisa,et al.  AAindex: amino acid index database, progress report 2008 , 2007, Nucleic Acids Res..

[6]  Leo S. D. Caves,et al.  Bio3d: An R Package , 2022 .

[7]  Germán Castellanos-Domínguez,et al.  Predicting Sub-cellular Location of Proteins Based on Hierarchical Clustering and Hidden Markov Models , 2015, IWBBIO.

[8]  S. Leach Physical principles and techniques of protein chemistry , 1969 .

[9]  See-Kiong Ng,et al.  ADVICE: Automated Detection and Validation of Interaction by Co-Evolution , 2004, Nucleic Acids Res..

[10]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[11]  Shao-Ping Shi,et al.  Using support vector machines to identify protein phosphorylation sites in viruses. , 2015, Journal of molecular graphics & modelling.

[12]  Sagnik Banerjee,et al.  Improving protein protein interaction prediction by choosing appropriate physiochemical properties of amino acids , 2015, 2015 International Conference and Workshop on Computing and Communication (IEMCON).

[13]  C. Ponting,et al.  Protein repeats: structures, functions, and evolution. , 2001, Journal of structural biology.

[14]  Rongrong Ji,et al.  Supervised hashing with kernels , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[16]  A. Mechelli,et al.  Using Support Vector Machine to identify imaging biomarkers of neurological and psychiatric disease: A critical review , 2012, Neuroscience & Biobehavioral Reviews.

[17]  Ujjwal Maulik,et al.  Fuzzy clustering of physicochemical and biochemical properties of amino Acids , 2011, Amino Acids.

[18]  Germán Castellanos-Domínguez,et al.  Improving the prediction of sub-cellular locations of proteins with a particle swarm optimization-based boosting strategy , 2012, 2012 Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[19]  Adam Godzik,et al.  Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences , 2006, Bioinform..

[20]  Hong Guo,et al.  Predicting protein–protein interaction sites using modified support vector machine , 2016, International Journal of Machine Learning and Cybernetics.

[21]  Johannes Goll,et al.  Protein interaction data curation: the International Molecular Exchange (IMEx) consortium , 2012, Nature Methods.

[22]  J. A. Jaramillo-Garzon,et al.  Prediction of protein-protein interactions through support vector machines , 2015, 2015 20th Symposium on Signal Processing, Images and Computer Vision (STSIVA).

[23]  Gary D Bader,et al.  Computational Prediction of Protein–Protein Interactions , 2008, Molecular biotechnology.

[24]  L. Bonetta Protein–protein interactions: Interactome under construction , 2010, Nature.

[25]  Burkhard Rost,et al.  Evolutionary profiles improve protein-protein interaction prediction from sequence , 2015, Bioinform..

[26]  Huan Liu,et al.  Efficient Feature Selection via Analysis of Relevance and Redundancy , 2004, J. Mach. Learn. Res..

[27]  R. Nussinov,et al.  Principles of protein-protein interactions: what are the preferred ways for proteins to interact? , 2008, Chemical reviews.