High-Accuracy Protein Structures by Combining Machine-Learning with Physics-Based Refinement

Protein structure prediction has long been available as an alternative to experimental structure determination, especially via homology modeling based on templates from related sequences. Recently, models based on distance restraints from co-evoluttionary analysis via machine learning have significantly expanded the ability to predict structures for sequences without templates. One such method, AlphaFold, also performs well on sequences were templates are available but without using such information directly. Here we show that combining machine-learning based models from AlphaFold with state-of-the-art physics-based refinement via molecular dynamics simulations further improves predictions to outperform any other prediction method tested during the latest round of CASP. The resulting models have highly accurate global and local structure, including high accuracy at functionally important interface residues, and they are highly suitable as initial models for crystal structure determination via molecular replacement.

[1]  Johannes Söding,et al.  Protein homology detection by HMM?CHMM comparison , 2005, Bioinform..

[2]  Jinbo Xu,et al.  Analysis of distance‐based protein structure prediction by deep learning in CASP13 , 2019, Proteins.

[3]  A. Biegert,et al.  HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment , 2011, Nature Methods.

[4]  Randy J Read,et al.  Evaluation of model refinement in CASP13 , 2019, Proteins.

[5]  Krzysztof Fidelis,et al.  CASP prediction center infrastructure and evaluation measures in CASP10 and CASP ROLL , 2014, Proteins.

[6]  Vahid Mirjalili,et al.  Protein Structure Refinement through Structure Selection and Averaging from Molecular Dynamics Ensembles. , 2013, Journal of chemical theory and computation.

[7]  Vahid Mirjalili,et al.  Protein structure refinement via molecular‐dynamics simulations: What works and what does not? , 2016, Proteins.

[8]  Matteo Dal Peraro,et al.  A further leap of improvement in tertiary structure prediction in CASP13 prompts new routes for future assessments , 2019, Proteins.

[9]  Frank Alber,et al.  A structural perspective on protein-protein interactions. , 2004, Current opinion in structural biology.

[10]  Yang Zhang Protein structure prediction: when is it useful? , 2009, Current opinion in structural biology.

[11]  Federico Gago,et al.  Protein–protein interactions at an enzyme–substrate interface: Characterization of transient reaction intermediates throughout a full catalytic cycle of Escherichia coli thioredoxin reductase , 2010, Proteins.

[12]  Randy J. Read,et al.  Phaser crystallographic software , 2007, Journal of applied crystallography.

[13]  Adam Zemla,et al.  LGA: a method for finding 3D similarities in protein structures , 2003, Nucleic Acids Res..

[14]  David T Jones,et al.  Prediction of interresidue contacts with DeepMetaPSICOV in CASP13 , 2019, Proteins.

[15]  Michael Feig,et al.  Experimental accuracy in protein structure refinement via molecular dynamics simulations , 2018, Proceedings of the National Academy of Sciences.

[16]  Georgios A. Pavlopoulos,et al.  Protein structure determination using metagenome sequence data , 2017, Science.

[17]  B. L. de Groot,et al.  CHARMM36m: an improved force field for folded and intrinsically disordered proteins , 2016, Nature Methods.

[18]  Michael Feig,et al.  Computational protein structure refinement: almost there, yet still so far to go , 2017, Wiley interdisciplinary reviews. Computational molecular science.

[19]  Frank DiMaio,et al.  Advances in Rosetta structure prediction for difficult molecular-replacement problems , 2013, Acta crystallographica. Section D, Biological crystallography.

[20]  Zhen Li,et al.  Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model , 2016, bioRxiv.

[21]  Maria Jesus Martin,et al.  Uniclust databases of clustered and deeply annotated protein sequences and alignments , 2016, Nucleic Acids Res..

[22]  J. Skolnick,et al.  On the origin and highly likely completeness of single-domain protein structures. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[23]  Badri Adhikari,et al.  Protein contact prediction by integrating deep multiple sequence alignments, coevolution and machine learning , 2018, Proteins.

[24]  G. Scapin,et al.  Molecular replacement then and now , 2013, Acta crystallographica. Section D, Biological crystallography.

[25]  Jimin Pei,et al.  An automatic method for CASP9 free modeling structure prediction assessment , 2011, Bioinform..

[26]  Jinbo Xu,et al.  Analysis of deep learning methods for blind protein contact prediction in CASP12 , 2018, Proteins.

[27]  David T. Jones,et al.  High precision in protein contact prediction using fully convolutional neural networks and minimal sequence features , 2018, Bioinform..

[28]  Mohammed AlQuraishi,et al.  AlphaFold at CASP13 , 2019, Bioinform..

[29]  Anna Tramontano,et al.  Evaluation of the template‐based modeling in CASP12 , 2018, Proteins.

[30]  David T Jones,et al.  Improved protein contact predictions with the MetaPSICOV2 server in CASP12 , 2018, Proteins.

[31]  Vahid Mirjalili,et al.  Physics‐based protein structure refinement through multiple molecular dynamics trajectories and structure averaging , 2014, Proteins.

[32]  Andriy Kryshtafovych,et al.  Assessment of contact predictions in CASP12: Co‐evolution and deep learning coming of age , 2017, Proteins.

[33]  Michael Levitt,et al.  On the universe of protein folds. , 2013, Annual review of biophysics.

[34]  Michael Feig,et al.  Local Protein Structure Refinement via Molecular Dynamics Simulations with locPREFMD , 2016, J. Chem. Inf. Model..

[35]  Randy J. Read,et al.  Acta Crystallographica Section D Biological , 2003 .

[36]  Michael Feig,et al.  Driven to near‐experimental accuracy by refinement via molecular dynamics simulations , 2019, Proteins.

[37]  Renzhi Cao,et al.  Protein tertiary structure modeling driven by deep learning and contact distance prediction in CASP13 , 2019, bioRxiv.

[38]  A. Sali,et al.  Protein Structure Prediction and Structural Genomics , 2001, Science.

[39]  David E. Kim,et al.  One contact for every twelve residues allows robust and accurate topology‐level protein structure modeling , 2014, Proteins.

[40]  T. Pawson,et al.  Protein-protein interactions define specificity in signal transduction. , 2000, Genes & development.

[41]  Michael Feig,et al.  PREFMD: a web server for protein structure refinement via molecular dynamics simulations , 2018, Bioinform..

[42]  Milot Mirdita,et al.  HH-suite3 for fast remote homology detection and deep protein annotation , 2019, BMC Bioinformatics.

[43]  David E. Kim,et al.  Simultaneous Optimization of Biomolecular Energy Functions on Features from Small Molecules and Macromolecules. , 2016, Journal of chemical theory and computation.

[44]  Marco Biasini,et al.  lDDT: a local superposition-free score for comparing protein structures and models using distance difference tests , 2013, Bioinform..

[45]  Anna Tramontano,et al.  Methods of model accuracy estimation can help selecting the best models from decoy sets: Assessment of model accuracy estimations in CASP11 , 2016, Proteins.

[46]  Kliment Olechnovič,et al.  CAD‐score: A new contact area difference‐based function for evaluation of protein structural models , 2013, Proteins.

[47]  Zukang Feng,et al.  The Protein Data Bank and structural genomics , 2003, Nucleic Acids Res..