Protein structure prediction using sparse NOE and RDC restraints with Rosetta in CASP13

Computational methods that produce accurate protein structure models from limited experimental data, e.g. from nuclear magnetic resonance (NMR) spectroscopy, hold great potential for biomedical research. The NMR-assisted modeling challenge in CASP13 provided a blind test to explore the capabilities and limitations of current modeling techniques in leveraging NMR data which had high sparsity, ambiguity and error rate for protein structure prediction. We describe our approach to predict the structure of these proteins leveraging the Rosetta software suite. Protein structure models were predicted de novo using a two-stage protocol. First, low-resolution models were generated with the Rosetta de novo method guided by non-ambiguous nuclear Overhauser effect (NOE) contacts and residual dipolar coupling (RDC) restraints. Second, iterative model hybridization and fragment insertion with the Rosetta comparative modeling method was used to refine and regularize models guided by all ambiguous and non-ambiguous NOE contacts and RDCs. Nine out of 16 of the Rosetta de novo models had the correct fold (GDT-TS score >45) and in three cases high-resolution models were achieved (RMSD <3.5 Å). We also show that a meta-approach applying iterative Rosetta+NMR refinement on server-predicted models which employed non-NMR-contacts and structural templates leads to substantial improvement in model quality. Integrating these data-assisted refinement strategies with innovative non-data-assisted approaches which became possible in CASP13 such as high precision contact prediction will in the near future enable structure determination for large proteins that are outside of the realm of conventional NMR.

[1]  Jens Meiler,et al.  Simultaneous prediction of protein secondary structure and transmembrane spans , 2013, Proteins.

[2]  Oliver F. Lange,et al.  Determination of solution structures of proteins up to 40 kDa using CS-Rosetta with sparse NMR data from deuterated samples , 2012, Proceedings of the National Academy of Sciences.

[3]  Badri Adhikari,et al.  DNCON2: Improved protein contact prediction using two-level deep convolutional neural networks , 2017 .

[4]  Daniel W. Kulp,et al.  Generalized Fragment Picking in Rosetta: Design, Protocols and Applications , 2011, PloS one.

[5]  D. Baker,et al.  Rapid protein fold determination using unassigned NMR data , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[6]  Jinbo Xu,et al.  Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model , 2016 .

[7]  Oliver F. Lange,et al.  NMR Structure Determination for Larger Proteins Using Backbone-Only Data , 2010, Science.

[8]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[9]  David T. Jones,et al.  MetaPSICOV: combining coevolution methods for accurate prediction of contacts and long range hydrogen bonding in proteins , 2014, Bioinform..

[10]  Georgios A. Pavlopoulos,et al.  Protein structure determination using metagenome sequence data , 2017, Science.

[11]  C Kooperberg,et al.  Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions. , 1997, Journal of molecular biology.

[12]  Jinbo Xu,et al.  Analysis of deep learning methods for blind protein contact prediction in CASP12 , 2018, Proteins.

[13]  Yang Zhang,et al.  Scoring function for automated assessment of protein structure template quality , 2004, Proteins.

[14]  Jeffrey J. Gray,et al.  Protein-protein docking with simultaneous optimization of rigid-body displacement and side-chain conformations. , 2003, Journal of molecular biology.

[15]  Jian Peng,et al.  Template-based protein structure modeling using the RaptorX web server , 2012, Nature Protocols.

[16]  David Baker,et al.  Protein structure prediction and analysis using the Robetta server , 2004, Nucleic Acids Res..

[17]  Chunli Yan,et al.  Integrative Modeling of Macromolecular Assemblies from Low to Near-Atomic Resolution , 2015, Computational and structural biotechnology journal.

[18]  Frank DiMaio,et al.  Protein structure prediction using Rosetta in CASP12 , 2018, Proteins.

[19]  Ad Bax,et al.  Validation of Protein Structure from Anisotropic Carbonyl Chemical Shifts in a Dilute Liquid Crystalline Phase , 1998 .

[20]  David E. Kim,et al.  One contact for every twelve residues allows robust and accurate topology‐level protein structure modeling , 2014, Proteins.

[21]  David Baker,et al.  Accurate protein structure modeling using sparse NMR data and homologous structure information , 2012, Proceedings of the National Academy of Sciences.

[22]  Yang Zhang,et al.  I-TASSER server for protein 3D structure prediction , 2008, BMC Bioinformatics.

[23]  Torsten Herrmann,et al.  Protein NMR structure determination with automated NOE assignment using the new software CANDID and the torsion angle dynamics algorithm DYANA. , 2002, Journal of molecular biology.

[24]  Yang Shen,et al.  Homology modeling of larger proteins guided by chemical shifts , 2015, Nature Methods.

[25]  David Baker,et al.  Protein Structure Prediction Using Rosetta , 2004, Numerical Computer Methods, Part D.

[26]  D. Baker,et al.  De novo protein structure determination using sparse NMR data , 2000, Journal of biomolecular NMR.

[27]  D. Baker,et al.  Alternate states of proteins revealed by detailed energy landscape mapping. , 2011, Journal of molecular biology.

[28]  David Baker,et al.  High-resolution comparative modeling with RosettaCM. , 2013, Structure.

[29]  Bonnie Berger,et al.  Enhancing Evolutionary Couplings with Deep Convolutional Neural Networks , 2017, Cell systems.

[30]  Yang Zhang,et al.  THE-DB: a threading model database for comparative protein structure analysis of the E. coli K12 and human proteomes , 2018, Database J. Biol. Databases Curation.

[31]  David Baker,et al.  Resolution-adapted recombination of structural features significantly improves sampling in restraint-guided structure calculation , 2011, Proteins.

[32]  Alexandre M J J Bonvin,et al.  Information-driven modeling of large macromolecular assemblies using NMR data. , 2014, Journal of magnetic resonance.

[33]  Jie Hou,et al.  DNCON2: improved protein contact prediction using two-level deep convolutional neural networks , 2017, bioRxiv.

[34]  Robert Powers,et al.  A topology‐constrained distance network algorithm for protein structure determination from NOESY data , 2005, Proteins.

[35]  P. Güntert,et al.  Combined automated NOE assignment and structure calculation with CYANA , 2015, Journal of Biomolecular NMR.

[36]  Yang Zhang,et al.  Ab initio protein structure assembly using continuous structure fragments and optimized knowledge‐based force field , 2012, Proteins.

[37]  Jens Meiler,et al.  ROSETTA3: an object-oriented software suite for the simulation and design of macromolecules. , 2011, Methods in enzymology.

[38]  Andriy Kryshtafovych,et al.  Assessment of contact predictions in CASP12: Co‐evolution and deep learning coming of age , 2017, Proteins.

[39]  Juergen Haas,et al.  The Protein Model Portal—a comprehensive resource for protein structure and model information , 2013, Database J. Biol. Databases Curation.

[40]  Oliver F. Lange,et al.  Consistent blind protein structure generation from NMR chemical shift data , 2008, Proceedings of the National Academy of Sciences.

[41]  David T. Jones,et al.  High precision in protein contact prediction using fully convolutional neural networks and minimal sequence features , 2018, Bioinform..

[42]  Yang Zhang,et al.  How significant is a protein structure similarity with TM-score = 0.5? , 2010, Bioinform..

[43]  Debora S. Marks,et al.  Protein structure determination by combining sparse NMR data with evolutionary couplings , 2015, Nature Methods.

[44]  David Baker,et al.  Structure prediction using sparse simulated NOE restraints with Rosetta in CASP11 , 2016, Proteins.

[45]  Yang Zhang,et al.  The I-TASSER Suite: protein structure and function prediction , 2014, Nature Methods.

[46]  D. Baker,et al.  De novo determination of protein backbone structure from residual dipolar couplings using Rosetta. , 2002, Journal of the American Chemical Society.

[47]  Torsten Schwede,et al.  SWISS-MODEL: homology modelling of protein structures and complexes , 2018, Nucleic Acids Res..