Solvent Accessible Surface Area-Based Hot-Spot Detection Methods for Protein-Protein and Protein-Nucleic Acid Interfaces

Due to the importance of hot-spots (HS) detection and the efficiency of computational methodologies, several HS detecting approaches have been developed. The current paper presents new models to predict HS for protein-protein and protein-nucleic acid interactions with better statistics compared with the ones currently reported in literature. These models are based on solvent accessible surface area (SASA) and genetic conservation features subjected to simple Bayes networks (protein-protein systems) and a more complex multi-objective genetic algorithm-support vector machine algorithms (protein-nucleic acid systems). The best models for these interactions have been implemented in two free Web tools.

[1]  K Schulten,et al.  VMD: visual molecular dynamics. , 1996, Journal of molecular graphics.

[2]  Akinori Sarai,et al.  Thermodynamic database for protein-nucleic acid interactions (ProNIT) , 2001, Bioinform..

[3]  A M Lesk,et al.  Interior and surface of monomeric proteins. , 1987, Journal of molecular biology.

[4]  Irina S. Moreira,et al.  Unravelling Hot Spots: a comprehensive computational mutagenesis study , 2006 .

[5]  E. Kay,et al.  Graph Theory. An Algorithmic Approach , 1975 .

[6]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[7]  D. Baker,et al.  A simple physical model for binding energy hot spots in protein–protein complexes , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[8]  D. Bailey,et al.  The Binding Interface Database (BID): A Compilation of Amino Acid Hot Spots in Protein Interfaces , 2003, Bioinform..

[9]  Peter A. Kollman,et al.  Computational alanine scanning of the 1:1 human growth hormone–receptor complex , 2002, J. Comput. Chem..

[10]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[11]  V. Vapnik Estimation of Dependences Based on Empirical Data , 2006 .

[12]  Irina S. Moreira,et al.  Extending the applicability of the O-ring theory to protein-DNA complexes , 2013, Comput. Biol. Chem..

[13]  Annick Dejaegere,et al.  Protein–protein recognition and interaction hot spots in an antigen–antibody complex: Free energy decomposition identifies “efficient amino acids” , 2007, Proteins.

[14]  P Prabakaran,et al.  Thermodynamic databases for proteins and protein-nucleic acid interactions. , 2001, Biopolymers.

[15]  Cyrus Chothia,et al.  The accessible surface area and stability of oligomeric proteins , 1987, Nature.

[16]  Julie C. Mitchell,et al.  KFC2: A knowledge‐based hot spot prediction method based on interface solvation, atomic density, and plasticity features , 2011, Proteins.

[17]  Harry Zhang,et al.  Exploring Conditions For The Optimality Of Naïve Bayes , 2005, Int. J. Pattern Recognit. Artif. Intell..

[18]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[19]  Carlos Fernandez-Lozano,et al.  Texture classification using feature selection and kernel-based techniques , 2015, Soft Computing.

[20]  Julie C. Mitchell,et al.  An automated decision‐tree approach to predicting protein interaction hot spots , 2007, Proteins.

[21]  P. Kollman,et al.  Combined molecular mechanical and continuum solvent approach (MM-PBSA/GBSA) to predict ligand binding , 2000 .

[22]  Ozlem Keskin,et al.  HotPoint: hot spot prediction server for protein interfaces , 2010, Nucleic Acids Res..

[23]  M. Michael Gromiha,et al.  PINT: Protein–protein Interactions Thermodynamic Database , 2005, Nucleic Acids Res..

[24]  Colin Campbell,et al.  A pathway-based data integration framework for prediction of disease progression , 2013, Bioinform..

[25]  Jinyan Li,et al.  ‘Double water exclusion’: a hypothesis refining the O-ring theory for the hot spots at protein interfaces , 2009, Bioinform..

[26]  Ozlem Keskin,et al.  Protein–DNA interactions: structural, thermodynamic and clustering patterns of conserved residues in DNA-binding proteins , 2008, Nucleic acids research.

[27]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[28]  P. Kollman,et al.  Calculating structures and free energies of complex molecules: combining molecular mechanics and continuum models. , 2000, Accounts of chemical research.

[29]  Burkhard Rost,et al.  ISIS: interaction sites identified from sequence , 2007, Bioinform..

[30]  David Baker,et al.  Protein structure prediction and analysis using the Robetta server , 2004, Nucleic Acids Res..

[31]  Kalyanmoy Deb,et al.  Multi-objective Genetic Algorithms: Problem Difficulties and Construction of Test Problems , 1999, Evolutionary Computation.

[32]  J. Foster,et al.  Machine Learning Techniques Accurately Classify Microbial Communities by Bacterial Vaginosis Characteristics , 2014, PloS one.

[33]  Michael E. Wall,et al.  Galib: a c++ library of genetic algorithm components , 1996 .

[34]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[35]  Nicos Christofides,et al.  Graph theory: An algorithmic approach (Computer science and applied mathematics) , 1975 .

[36]  Mauricio Zambrano-Bigiarini,et al.  Standard Particle Swarm Optimisation 2011 at CEC-2013: A baseline for future PSO improvements , 2013, 2013 IEEE Congress on Evolutionary Computation.

[37]  Remco R. Bouckaert,et al.  Bayesian Network Classifiers in Weka for Version 3-5-7 , 2007 .

[38]  P A Fernandes,et al.  Understanding the importance of the aromatic amino-acid residues as hot-spots. , 2013, Biochimica et biophysica acta.

[39]  Irina S Moreira,et al.  Computational Alanine Scanning Mutagenesis-An Improved Methodological Approach for Protein-DNA Complexes. , 2013, Journal of chemical theory and computation.

[40]  Tanja Kortemme,et al.  Structural mapping of protein interactions reveals differences in evolutionary pressures correlated to mRNA level and protein abundance. , 2007, Structure.

[41]  Ethem Alpaydin,et al.  Multiple Kernel Learning Algorithms , 2011, J. Mach. Learn. Res..

[42]  Burkhard Rost,et al.  Protein–Protein Interaction Hotspots Carved into Sequences , 2007, PLoS Comput. Biol..

[43]  Dusanka Janezic,et al.  Correlating Protein Hot Spot Surface Analysis Using ProBiS with Simulated Free Energies of Protein-Protein Interfacial Residues , 2012, J. Chem. Inf. Model..

[44]  B. Lee,et al.  The interpretation of protein structures: estimation of static accessibility. , 1971, Journal of molecular biology.

[45]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2004 .

[46]  W. Delano,et al.  Convergent solutions to binding at a protein-protein interface. , 2000, Science.

[47]  D. Hand,et al.  Idiot's Bayes—Not So Stupid After All? , 2001 .

[48]  Stefano Cagnoni,et al.  Differential evolution based human body pose estimation from point clouds , 2013, GECCO '13.

[49]  I. Moreira,et al.  COMPUTATIONAL DETERMINATION OF THE RELATIVE FREE ENERGY OF BINDING - APPLICATION TO ALANINE SCANNING MUTAGENESIS , 2007 .

[50]  J. Dorado,et al.  Improving enzyme regulatory protein classification by means of SVM-RFE feature selection. , 2014, Molecular bioSystems.

[51]  Akinori Sarai,et al.  ProTherm and ProNIT: thermodynamic databases for proteins and protein–nucleic acid interactions , 2005, Nucleic Acids Res..

[52]  J. Janin,et al.  Elusive affinities , 1995, Proteins.

[53]  J. Martins,et al.  Solvent‐accessible surface area: How well can be applied to hot‐spot detection? , 2014, Proteins.

[54]  Ozlem Keskin,et al.  Identification of computational hot spots in protein interfaces: combining solvent accessibility and inter-residue potentials improves the accuracy , 2009, Bioinform..

[55]  P. Kollman,et al.  Computational Alanine Scanning To Probe Protein−Protein Interactions: A Novel Approach To Evaluate Binding Free Energies , 1999 .

[56]  Ethem Alpaydin,et al.  Introduction to machine learning , 2004, Adaptive computation and machine learning.

[57]  A. Bogan,et al.  Anatomy of hot spots in protein interfaces. , 1998, Journal of molecular biology.

[58]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[59]  Guido Van Rossum,et al.  Python Tutorial , 1999 .

[60]  Xing-Ming Zhao,et al.  APIS: accurate prediction of hot spots in protein interfaces by combining protrusion index with solvent accessibility , 2010, BMC Bioinformatics.

[61]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[62]  Pedro Alexandrino Fernandes,et al.  Computational alanine scanning mutagenesis—An improved methodological approach , 2007, J. Comput. Chem..

[63]  S. Jones,et al.  Principles of protein-protein interactions. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[64]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[65]  Julie C. Mitchell,et al.  KFC Server: interactive forecasting of protein interaction hot spots , 2008, Nucleic Acids Res..

[66]  Marta A. S. Perez,et al.  Computational Alanine Scanning Mutagenesis: MM-PBSA vs TI. , 2013, Journal of chemical theory and computation.

[67]  Juan Fernández-Recio,et al.  SKEMPI: a Structural Kinetic and Energetic database of Mutant Protein Interactions and its use in empirical models , 2012, Bioinform..

[68]  Yu Xia,et al.  Structural determinants of protein evolution are context-sensitive at the residue level. , 2009, Molecular biology and evolution.

[69]  D. Higgins,et al.  Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega , 2011, Molecular systems biology.

[70]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[71]  Kurt S. Thorn,et al.  ASEdb: a database of alanine mutations and their effects on the free energy of binding in protein interactions , 2001, Bioinform..

[72]  A. Shrake,et al.  Environment and exposure to solvent of protein atoms. Lysozyme and insulin. , 1973, Journal of molecular biology.

[73]  Irina S. Moreira,et al.  Are Hot-Spots Occluded from Water? , 2013 .

[74]  Carlos Fernandez-Lozano,et al.  Markov mean properties for cell death-related protein classification. , 2014, Journal of theoretical biology.

[75]  Maurice Clerc,et al.  Beyond Standard Particle Swarm Optimisation , 2010, Int. J. Swarm Intell. Res..

[76]  T. Clackson,et al.  Structural and functional analysis of the 1:1 growth hormone:receptor complex reveals the molecular basis for receptor affinity. , 1998, Journal of molecular biology.

[77]  大西 仁,et al.  Pearl, J. (1988, second printing 1991). Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan-Kaufmann. , 1994 .

[78]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[79]  G J Williams,et al.  The Protein Data Bank: a computer-based archival file for macromolecular structures. , 1978, Archives of biochemistry and biophysics.

[80]  Tal Pupko,et al.  ConSurf 2010: calculating evolutionary conservation in sequence and structure of proteins and nucleic acids , 2010, Nucleic Acids Res..

[81]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .