SpotOn: High Accuracy Identification of Protein-Protein Interface Hot-Spots

We present SpotOn, a web server to identify and classify interfacial residues as Hot-Spots (HS) and Null-Spots (NS). SpotON implements a robust algorithm with a demonstrated accuracy of 0.95 and sensitivity of 0.98 on an independent test set. The predictor was developed using an ensemble machine learning approach with up-sampling of the minor class. It was trained on 53 complexes using various features, based on both protein 3D structure and sequence. The SpotOn web interface is freely available at: http://milou.science.uu.nl/services/SPOTON/.

[1]  Hui Ding,et al.  Predicting ion channels and their types by the dipeptide mode of pseudo amino acid composition. , 2011, Journal of theoretical biology.

[2]  Julie C. Mitchell,et al.  KFC2: A knowledge‐based hot spot prediction method based on interface solvation, atomic density, and plasticity features , 2011, Proteins.

[3]  Wei Chen,et al.  RAMPred: identifying the N1-methyladenosine sites in eukaryotic transcriptomes , 2016, Scientific Reports.

[4]  Gerard J. P. van Westen,et al.  Proteochemometric modeling as a tool to design selective compounds and for extrapolating to novel targets , 2011 .

[5]  Wei Chen,et al.  Identifying RNA 5-methylcytosine sites via pseudo nucleotide compositions. , 2016, Molecular bioSystems.

[6]  Giorgio Valentini,et al.  Ensembles of Learning Machines , 2002, WIRN.

[7]  Alexander S. Rose,et al.  NGL Viewer: a web application for molecular visualization , 2015, Nucleic Acids Res..

[8]  Wei Chen,et al.  iOri-Human: identify human origin of replication by incorporating dinucleotide physicochemical properties into pseudo nucleotide composition , 2016, Oncotarget.

[9]  M. Natália D. S. Cordeiro,et al.  Solvent Accessible Surface Area-Based Hot-Spot Detection Methods for Protein-Protein and Protein-Nucleic Acid Interfaces , 2015, J. Chem. Inf. Model..

[10]  Max Kuhn,et al.  Building Predictive Models in R Using the caret Package , 2008 .

[11]  Hao Lin,et al.  Prediction of cell wall lytic enzymes using Chou's amphiphilic pseudo amino acid composition. , 2009, Protein and peptide letters.

[12]  Hui Ding,et al.  Identify Golgi protein types with modified Mahalanobis discriminant algorithm and pseudo amino acid composition. , 2011, Protein and peptide letters.

[13]  J. Martins,et al.  Solvent‐accessible surface area: How well can be applied to hot‐spot detection? , 2014, Proteins.

[14]  Dong-Sheng Cao,et al.  protr/ProtrWeb: R package and web server for generating various numerical representation schemes of protein sequences , 2015, Bioinform..

[15]  Irina S Moreira,et al.  Computational Alanine Scanning Mutagenesis-An Improved Methodological Approach for Protein-DNA Complexes. , 2013, Journal of chemical theory and computation.

[16]  Pufeng Du,et al.  PseAAC-General: Fast Building Various Modes of General Form of Chou’s Pseudo-Amino Acid Composition for Large-Scale Protein Datasets , 2014, International journal of molecular sciences.

[17]  Peter Dalgaard,et al.  R Development Core Team (2010): R: A language and environment for statistical computing , 2010 .

[18]  Ovidiu Ivanciuc,et al.  Chemical graphs, molecular matrices and topological indices in chemoinformatics and quantitative structure-activity relationships. , 2013, Current computer-aided drug design.

[19]  K. Chou,et al.  iCTX-Type: A Sequence-Based Predictor for Identifying the Types of Conotoxins in Targeting Ion Channels , 2014, BioMed research international.

[20]  Hao Lin,et al.  Identifying Sigma70 Promoters with Novel Pseudo Nucleotide Composition , 2019, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[21]  H. Ding,et al.  Identification of mitochondrial proteins of malaria parasite using analysis of variance , 2014, Amino Acids.

[22]  Irina S Moreira The Role of Water Occlusion for the Definition of a Protein Binding Hot-Spot. , 2015, Current topics in medicinal chemistry.

[23]  William R Pearson,et al.  BLAST and FASTA similarity searching for multiple sequence alignment. , 2014, Methods in molecular biology.

[24]  Alexandre M. J. J. Bonvin,et al.  CPORT: A Consensus Interface Predictor and Its Performance in Prediction-Driven Docking with HADDOCK , 2011, PloS one.

[25]  K. Chou Some remarks on protein attribute prediction and pseudo amino acid composition , 2010, Journal of Theoretical Biology.

[26]  Hao Lin The modified Mahalanobis Discriminant for predicting outer membrane proteins by using Chou's pseudo amino acid composition. , 2008, Journal of theoretical biology.

[27]  M. Michael Gromiha,et al.  PINT: Protein–protein Interactions Thermodynamic Database , 2005, Nucleic Acids Res..

[28]  Vasant G Honavar,et al.  Computational prediction of protein interfaces: A review of data driven methods , 2015, FEBS letters.

[29]  D. Bailey,et al.  The Binding Interface Database (BID): A Compilation of Amino Acid Hot Spots in Protein Interfaces , 2003, Bioinform..

[30]  Irina S. Moreira,et al.  A Machine Learning Approach for Hot-Spot Detection at Protein-Protein Interfaces , 2016, International journal of molecular sciences.

[31]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[32]  K Schulten,et al.  VMD: visual molecular dynamics. , 1996, Journal of molecular graphics.

[33]  Hui Ding,et al.  AcalPred: A Sequence-Based Tool for Discriminating between Acidic and Alkaline Enzymes , 2013, PloS one.

[34]  Jan Tavernier,et al.  Modulation of Protein–Protein Interactions for the Development of Novel Therapeutics , 2015, Molecular therapy : the journal of the American Society of Gene Therapy.

[35]  G. Marius Clore,et al.  Refined solution structure of the oligomerization domain of the tumour suppressor p53 , 1995, Nature Structural Biology.

[36]  Wei Chen,et al.  PAI: Predicting adenosine to inosine editing sites by using pseudo nucleotide compositions , 2016, Scientific Reports.

[37]  T. Clackson,et al.  A hot spot of binding energy in a hormone-receptor interface , 1995, Science.

[38]  Yang Zhang,et al.  Predicting the Effect of Mutations on Protein-Protein Binding Interactions through Structure-Based Interface Profiles , 2015, PLoS Comput. Biol..

[39]  Pedro A Fernandes,et al.  Hot spots—A review of the protein–protein interface determinant amino‐acid residues , 2007, Proteins.

[40]  Hua Tang,et al.  Identification of Secretory Proteins in Mycobacterium tuberculosis Using Pseudo Amino Acid Composition , 2016, BioMed research international.

[41]  Hao Lin,et al.  Eukaryotic and prokaryotic promoter prediction using hybrid approach , 2011, Theory in Biosciences.

[42]  Wei Chen,et al.  Prediction of phosphothreonine sites in human proteins by fusing different features , 2016, Scientific Reports.

[43]  David Baker,et al.  Protein structure prediction and analysis using the Robetta server , 2004, Nucleic Acids Res..

[44]  Yair Neuman The Definition of Life and the Life of a Definition , 2012, Journal of biomolecular structure & dynamics.

[45]  Ning Ma,et al.  BLAST+: architecture and applications , 2009, BMC Bioinformatics.

[46]  Juan Fernández-Recio,et al.  SKEMPI: a Structural Kinetic and Energetic database of Mutant Protein Interactions and its use in empirical models , 2012, Bioinform..

[47]  Ronald Meester Simulation of biological evolution and the NFL theorems , 2009, Biology & philosophy.

[48]  B. Rost,et al.  Protein function in precision medicine: deep understanding with machine learning , 2016, FEBS letters.

[49]  Hui Ding,et al.  The prediction of protein structural class using averaged chemical shifts , 2012, Journal of biomolecular structure & dynamics.

[50]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[51]  Kurt S. Thorn,et al.  ASEdb: a database of alanine mutations and their effects on the free energy of binding in protein interactions , 2001, Bioinform..

[52]  Hao Lin,et al.  Predicting subcellular localization of mycobacterial proteins by using Chou's pseudo amino acid composition. , 2008, Protein and peptide letters.