Revisiting the “satisfaction of spatial restraints” approach of MODELLER for protein homology modeling

The most frequently used approach for protein structure prediction is currently homology modeling. The 3D model building phase of this methodology is critical for obtaining an accurate and biologically useful prediction. The most widely employed tool to perform this task is MODELLER. This program implements the “modeling by satisfaction of spatial restraints” strategy and its core algorithm has not been altered significantly since the early 1990s. In this work, we have explored the idea of modifying MODELLER with two effective, yet computationally light strategies to improve its 3D modeling performance. Firstly, we have investigated how the level of accuracy in the estimation of structural variability between a target protein and its templates in the form of σ values profoundly influences 3D modeling. We show that the σ values produced by MODELLER are on average weakly correlated to the true level of structural divergence between target-template pairs and that increasing this correlation greatly improves the program’s predictions, especially in multiple-template modeling. Secondly, we have inquired into how the incorporation of statistical potential terms (such as the DOPE potential) in the MODELLER’s objective function impacts positively 3D modeling quality by providing a small but consistent improvement in metrics such as GDT-HA and lDDT and a large increase in stereochemical quality. Python modules to harness this second strategy are freely available at https://github.com/pymodproject/altmod. In summary, we show that there is a large room for improving MODELLER in terms of 3D modeling quality and we propose strategies that could be pursued in order to further increase its performance.

[1]  A. Sali,et al.  Statistical potential for assessment and prediction of protein structures , 2006, Protein science : a publication of the Protein Society.

[2]  Torsten Schwede,et al.  Protein modeling: what happened to the "protein structure gap"? , 2013, Structure.

[3]  Timothy Nugent,et al.  De novo membrane protein structure prediction. , 2015, Methods in molecular biology.

[4]  Yang Zhang,et al.  Scoring function for automated assessment of protein structure template quality , 2004, Proteins.

[5]  D. Baker,et al.  An orientation-dependent hydrogen bonding potential improves prediction of specificity and structure for proteins and protein-protein complexes. , 2003, Journal of molecular biology.

[6]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[7]  Jooyoung Lee,et al.  De novo protein structure prediction by dynamic fragment assembly and conformational space annealing , 2011, Proteins.

[8]  Yang Zhang,et al.  How significant is a protein structure similarity with TM-score = 0.5? , 2010, Bioinform..

[9]  András Fiser,et al.  New statistical potential for quality assessment of protein models and a survey of energy functions , 2010, BMC Bioinformatics.

[10]  Michael Feig,et al.  Computational protein structure refinement: almost there, yet still so far to go , 2017, Wiley interdisciplinary reviews. Computational molecular science.

[11]  Randy J Read,et al.  Evaluation of template‐based modeling in CASP13 , 2019, Proteins.

[12]  Andriy Kryshtafovych,et al.  Assessment of contact predictions in CASP12: Co‐evolution and deep learning coming of age , 2017, Proteins.

[13]  J. Skolnick,et al.  TM-align: a protein structure alignment algorithm based on the TM-score , 2005, Nucleic acids research.

[14]  Marco Biasini,et al.  lDDT: a local superposition-free score for comparing protein structures and models using distance difference tests , 2013, Bioinform..

[15]  Arne Elofsson,et al.  ProQ3D: improved model quality assessments using deep learning , 2016, Bioinform..

[16]  Yang Zhang,et al.  A comparative assessment and analysis of 20 representative sequence alignment methods for protein structure prediction , 2013, Scientific Reports.

[17]  Michael Levitt,et al.  Consistent refinement of submitted models at CASP using a knowledge‐based potential , 2010, Proteins.

[18]  David Baker,et al.  Protein homology model refinement by large-scale energy optimization , 2018, Proceedings of the National Academy of Sciences.

[19]  David Baker,et al.  High-resolution comparative modeling with RosettaCM. , 2013, Structure.

[20]  Keehyoung Joo,et al.  Template based protein structure modeling by global optimization in CASP11 , 2016, Proteins.

[21]  Vincent B. Chen,et al.  Correspondence e-mail: , 2000 .

[22]  Michael Feig,et al.  Experimental accuracy in protein structure refinement via molecular dynamics simulations , 2018, Proceedings of the National Academy of Sciences.

[23]  Arne Elofsson,et al.  Using multiple templates to improve quality of homology models in automated homology modeling , 2008, Protein science : a publication of the Protein Society.

[24]  Johannes Söding,et al.  Protein homology detection by HMM?CHMM comparison , 2005, Bioinform..

[25]  Johannes Söding,et al.  Automatic Prediction of Protein 3D Structures by Probabilistic Multi-template Homology Modeling , 2015, PLoS Comput. Biol..

[26]  Ben M. Webb,et al.  Comparative Protein Structure Modeling Using MODELLER , 2007, Current protocols in protein science.

[27]  Arne Elofsson,et al.  All are not equal: A benchmark of different homology modeling programs , 2005, Protein science : a publication of the Protein Society.

[28]  Keehyoung Joo,et al.  All‐atom chain‐building by optimizing MODELLER energy function using conformational space annealing , 2009, Proteins.

[29]  Jianlin Cheng,et al.  MULTICOM: a multi-level combination approach to protein structure prediction and its assessments in CASP8 , 2010, Bioinform..

[30]  Lukas Zimmermann,et al.  A Completely Reimplemented MPI Bioinformatics Toolkit with a New HHpred Server at its Core. , 2017, Journal of molecular biology.

[31]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[32]  Yang Zhang,et al.  The I-TASSER Suite: protein structure and function prediction , 2014, Nature Methods.

[33]  Roland L Dunbrack,et al.  Outcome of a workshop on applications of protein models in biomedical research. , 2009, Structure.

[34]  Hongyi Zhou,et al.  Distance‐scaled, finite ideal‐gas reference state improves structure‐derived potentials of mean force for structure selection and stability prediction , 2002, Protein science : a publication of the Protein Society.

[35]  Anna Tramontano,et al.  Evaluation of the template‐based modeling in CASP12 , 2018, Proteins.

[36]  David A. Lee,et al.  CATH: an expanded resource to predict protein function through structure and sequence , 2016, Nucleic Acids Res..

[37]  Arne Elofsson,et al.  Automatic consensus‐based fold recognition using Pcons, ProQ, and Pmodeller , 2003, Proteins.

[38]  Jianpeng Ma,et al.  CHARMM: The biomolecular simulation program , 2009, J. Comput. Chem..

[39]  Keehyoung Joo,et al.  Sigma-RF: prediction of the variability of spatial restraints in template-based modeling by random forest , 2015, BMC Bioinformatics.

[40]  Keehyoung Joo,et al.  Protein structure modeling and refinement by global optimization in CASP12 , 2018, Proteins.

[41]  Daniel J. Rigden,et al.  From Protein Structure to Function with Bioinformatics , 2009 .

[42]  T. Blundell,et al.  Comparative protein modelling by satisfaction of spatial restraints. , 1993, Journal of molecular biology.

[43]  A. Tramontano,et al.  Critical assessment of methods of protein structure prediction (CASP)—round IX , 2011, Proteins.

[44]  Keehyoung Joo,et al.  Protein structure modeling for CASP10 by multiple layers of global optimization , 2014, Proteins.

[45]  A. Biegert,et al.  HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment , 2011, Nature Methods.

[46]  Guoli Wang,et al.  PISCES: recent improvements to a PDB sequence culling server , 2005, Nucleic Acids Res..

[47]  David Baker,et al.  Incorporation of evolutionary information into Rosetta comparative modeling , 2011, Proteins.