Incorporating post-translational modifications and unnatural amino acids into high-throughput modeling of protein structures

MOTIVATION Accurately predicting protein side-chain conformations is an important subproblem of the broader protein structure prediction problem. Several methods exist for generating fairly accurate models for moderate-size proteins in seconds or less. However, a major limitation of these methods is their inability to model post-translational modifications (PTMs) and unnatural amino acids. In natural living systems, the chemical groups added following translation are often critical for the function of the protein. In engineered systems, unnatural amino acids are incorporated into proteins to explore structure-function relationships and create novel proteins. RESULTS We present a new version of SIDEpro to predict the side chains of proteins containing non-standard amino acids, including 15 of the most frequently observed PTMs in the Protein Data Bank and all types of phosphorylation. SIDEpro uses energy functions that are parameterized by neural networks trained from available data. For PTMs, the [Formula: see text] and [Formula: see text] accuracies are comparable with those obtained for the precursor amino acid, and so are the RMSD values for the atoms shared with the precursor amino acid. In addition, SIDEpro can accommodate any PTM or unnatural amino acid, thus providing a flexible prediction system for high-throughput modeling of proteins beyond the standard amino acids. AVAILABILITY AND IMPLEMENTATION SIDEpro programs and Web server, rotamer libraries and data are available through the SCRATCH suite of protein structure predictors at http://scratch.proteomics.ics.uci.edu/

[1]  T. Blundell,et al.  Comparative protein modelling by satisfaction of spatial restraints. , 1993, Journal of molecular biology.

[2]  Andrej ⩽ali,et al.  Comparative protein modeling by satisfaction of spatial restraints , 1995 .

[3]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence data bank and its supplement TrEMBL , 1997, Nucleic Acids Res..

[4]  N. Blom,et al.  Sequence and structure-based prediction of eukaryotic protein phosphorylation sites. , 1999, Journal of molecular biology.

[5]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[6]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000 , 2000, Nucleic Acids Res..

[7]  J. Richardson,et al.  The penultimate rotamer library , 2000, Proteins.

[8]  Michael B. Yaffe,et al.  Scansite 2.0: proteome-wide prediction of cell signaling interactions using short sequence motifs , 2003, Nucleic Acids Res..

[9]  N. Blom,et al.  Prediction of post‐translational glycosylation and phosphorylation of proteins from the amino acid sequence , 2004, Proteomics.

[10]  Bermseok Oh,et al.  Prediction of phosphorylation sites using SVMs , 2004, Bioinform..

[11]  Peter G Schultz,et al.  Adding amino acids to the genetic repertoire. , 2005, Current opinion in chemical biology.

[12]  S. Brunak,et al.  Prediction, conservation analysis, and structural characterization of mammalian mucin-type O-glycosylation sites. , 2005, Glycobiology.

[13]  Rong Zeng,et al.  Predicting O-glycosylation sites in mammalian proteins by using SVMs , 2006, Comput. Biol. Chem..

[14]  Yun He,et al.  A novel method for high accuracy sumoylation site prediction from protein sequences , 2008, BMC Bioinformatics.

[15]  Thomas Lengauer,et al.  IRECS: A new algorithm for the selection of most probable ensembles of side‐chain conformations in protein models , 2007, Protein science : a publication of the Protein Society.

[16]  Shuli Kang,et al.  Meta-prediction of phosphorylation sites with weighted voting and restricted grid search parameter selection , 2008, Nucleic acids research.

[17]  Jonathan D. Hirst,et al.  Prediction of glycosylation sites using random forests , 2008, BMC Bioinformatics.

[18]  Jianpeng Ma,et al.  OPUS‐Rota: A fast and accurate method for side‐chain modeling , 2008, Protein science : a publication of the Protein Society.

[19]  Yu Xue,et al.  Systematic study of protein sumoylation: Development of a site‐specific predictor of SUMOsp 2.0 , 2009, Proteomics.

[20]  Roland L. Dunbrack,et al.  proteins STRUCTURE O FUNCTION O BIOINFORMATICS Improved prediction of protein side-chain conformations with SCWRL4 , 2022 .

[21]  Qian Wang,et al.  Expanding the genetic code for biological studies. , 2009, Chemistry & biology.

[22]  Yu Shyr,et al.  Improved prediction of lysine acetylation by support vector machines. , 2009, Protein and peptide letters.

[23]  Sandhya Rani,et al.  Human Protein Reference Database—2009 update , 2008, Nucleic Acids Res..

[24]  Xin Gao,et al.  A protein-dependent side-chain rotamer library , 2011, BMC Bioinformatics.

[25]  Florian Gnad,et al.  PHOSIDA 2011: the posttranslational modification database , 2010, Nucleic Acids Res..

[26]  Jens Meiler,et al.  ROSETTA3: an object-oriented software suite for the simulation and design of macromolecules. , 2011, Methods in enzymology.

[27]  Roland L. Dunbrack,et al.  A smoothed backbone-dependent rotamer library for proteins derived from adaptive kernel density estimates and regressions. , 2011, Structure.

[28]  Chris Morley,et al.  Open Babel: An open chemical toolbox , 2011, J. Cheminformatics.

[29]  Pierre Baldi,et al.  Data-Driven High-Throughput Prediction of the 3-D Structure of Small Molecules: Review and Progress , 2011, J. Chem. Inf. Model..

[30]  Christodoulos A. Floudas,et al.  Proteome-wide post-translational modification statistics: frequency analysis and curation of the swiss-prot database , 2011, Scientific reports.

[31]  Chi Zhang,et al.  Fast and accurate prediction of protein side-chain conformations , 2011, Bioinform..

[32]  Yang Cao,et al.  RASP: rapid modeling of protein side chain conformations , 2011, Bioinform..

[33]  Cathryn M. Gould,et al.  Phospho.ELM: a database of phosphorylation sites—update 2011 , 2010, Nucleic acids research.

[34]  Valerie Daggett,et al.  The dynameomics rotamer library: Amino acid side chain conformations and dynamics from comprehensive molecular dynamics simulations in water , 2011, Protein science : a publication of the Protein Society.

[35]  Pierre Baldi,et al.  SIDEpro: A novel machine learning approach for the fast and accurate prediction of side‐chain conformations , 2012, Proteins.

[36]  Eun Jung Choi,et al.  Incorporation of Noncanonical Amino Acids into Rosetta and Use in Computational Protein-Peptide Interface Design , 2012, PloS one.

[37]  Subhadip Basu,et al.  AMS 4.0: consensus prediction of post-translational modifications in protein sequences , 2012, Amino Acids.

[38]  Bin Zhang,et al.  PhosphoSitePlus: a comprehensive resource for investigating the structure and function of experimentally determined post-translational modifications in man and mouse , 2011, Nucleic Acids Res..

[39]  Olivier Michielin,et al.  Expanding molecular modeling and design tools to non‐natural sidechains , 2012, J. Comput. Chem..