Assessing the utility of CASP14 models for molecular replacement

The assessment of CASP models for utility in molecular replacement is a measure of their use in a valuable real‐world application. In CASP7, the metric for molecular replacement assessment involved full likelihood‐based molecular replacement searches; however, this restricted the assessable targets to crystal structures with only one copy of the target in the asymmetric unit, and to those where the search found the correct pose. In CASP10, full molecular replacement searches were replaced by likelihood‐based rigid‐body refinement of models superimposed on the target using the LGA algorithm, with the metric being the refined log‐likelihood‐gain (LLG) score. This enabled multi‐copy targets and very poor models to be evaluated, but a significant further issue remained: the requirement of diffraction data for assessment. We introduce here the relative‐expected‐LLG (reLLG), which is independent of diffraction data. This reLLG is also independent of any crystal form, and can be calculated regardless of the source of the target, be it X‐ray, NMR or cryo‐EM. We calibrate the reLLG against the LLG for targets in CASP14, showing that it is a robust measure of both model and group ranking. Like the LLG, the reLLG shows that accurate coordinate error estimates add substantial value to predicted models. We find that refinement by CASP groups can often convert an inadequate initial model into a successful MR search model. Consistent with findings from others, we show that the AlphaFold2 models are sufficiently good, and reliably so, to surpass other current model generation strategies for attempting molecular replacement phasing.

[1]  Randy J. Read,et al.  Local Error Estimates Dramatically Improve the Utility of Homology Models for Solving Crystal Structures by Molecular Replacement , 2015, Structure.

[2]  Ronan M Keegan,et al.  AMPLE: a cluster-and-truncate approach to solve the crystal structures of small proteins using rapidly computed ab initio models. , 2012, Acta crystallographica. Section D, Biological crystallography.

[3]  Minkyung Baek,et al.  Protein tertiary structure prediction and refinement using deep learning and Rosetta in CASP14 , 2021, Proteins.

[4]  Krzysztof Fidelis,et al.  CASP9 results compared to those of previous casp experiments , 2011, Proteins.

[5]  J. Moult,et al.  Computational models in the service of X‐ray and cryo‐electron microscopy structure determination , 2021, Proteins.

[6]  Wah Chiu,et al.  Cryo‐electron microscopy targets in CASP13: Overview and evaluation of results , 2019, Proteins.

[7]  Randy J Read,et al.  Evaluation of template‐based modeling in CASP13 , 2019, Proteins.

[8]  A. Lesk,et al.  The relation between the divergence of sequence and structure in proteins. , 1986, The EMBO journal.

[9]  K. Jarrod Millman,et al.  Array programming with NumPy , 2020, Nat..

[10]  Bernhard Lohkamp,et al.  Ab initio solution of macromolecular crystal structures without direct methods , 2017, Proceedings of the National Academy of Sciences.

[11]  Randy J. Read,et al.  Overview of the CCP4 suite and current developments , 2011, Acta crystallographica. Section D, Biological crystallography.

[12]  M. Heel,et al.  Exact filters for general geometry three dimensional reconstruction , 1986 .

[13]  R. Read,et al.  Phasertng: directed acyclic graphs for crystallographic phasing , 2021, Acta crystallographica. Section D, Structural biology.

[14]  Oriol Vinyals,et al.  Highly accurate protein structure prediction with AlphaFold , 2021, Nature.

[15]  Randy J Read,et al.  A log-likelihood-gain intensity target for crystallographic phasing that accounts for experimental error , 2016, Acta crystallographica. Section D, Structural biology.

[16]  Janusz M. Bujnicki,et al.  The utility of comparative models and the local model quality for protein crystal structure determination by Molecular Replacement , 2012, BMC Bioinformatics.

[17]  Haruki Nakamura,et al.  The worldwide Protein Data Bank (wwPDB): ensuring a single, uniform archive of PDB data , 2006, Nucleic Acids Res..

[18]  R. Read Structure-factor probabilities for related structures , 1990 .

[19]  J. Skolnick,et al.  TM-align: a protein structure alignment algorithm based on the TM-score , 2005, Nucleic acids research.

[20]  G. Sheldrick,et al.  Crystallographic ab initio protein structure solution below atomic resolution , 2009, Nature Methods.

[21]  Adam Zemla,et al.  LGA: a method for finding 3D similarities in protein structures , 2003, Nucleic Acids Res..

[22]  R. Read,et al.  Improved estimates of coordinate error for molecular replacement , 2013, Acta crystallographica. Section D, Biological crystallography.

[23]  et al.,et al.  Jupyter Notebooks - a publishing format for reproducible computational workflows , 2016, ELPUB.

[24]  Filomeno Sánchez Rodríguez,et al.  Evaluation of model refinement in CASP14 , 2021, Proteins.

[25]  Gyu Rie Lee,et al.  Accurate prediction of protein structures and interactions using a 3-track neural network , 2021, Science.

[26]  Ekaba Bisong,et al.  Building Machine Learning and Deep Learning Models on Google Cloud Platform: A Comprehensive Guide for Beginners , 2019 .

[27]  A. Wilson,et al.  The probability distribution of X-ray intensities , 1949 .

[28]  R. Read,et al.  Possible Implications of AlphaFold2 for Crystallographic Phasing by Molecular Replacement , 2021, bioRxiv.

[29]  Randy J Read,et al.  Assessment of CASP7 predictions in the high accuracy template‐based modeling category , 2007, Proteins.

[30]  M. Topf,et al.  Cryo‐EM targets in CASP14 , 2021, Proteins.

[31]  Ronan M Keegan,et al.  Recent developments in MrBUMP: better search-model preparation, graphical interaction with search models, and solution improvement and assessment , 2018, Acta crystallographica. Section D, Structural biology.

[32]  N. Grishin,et al.  Target classification in the 14th round of the critical assessment of protein structure prediction (CASP14) , 2021, Proteins.

[33]  Randy J Read,et al.  On the application of the expected log-likelihood gain to decision making in molecular replacement , 2018, Acta crystallographica. Section D, Structural biology.

[34]  J. Yoon,et al.  Helicobacter pylori proinflammatory protein up-regulates NF-κB as a cell-translocating Ser/Thr kinase , 2010, Proceedings of the National Academy of Sciences.

[35]  M. Feig,et al.  Physics‐based protein structure refinement in the era of artificial intelligence , 2021, Proteins.

[36]  P. Bradley,et al.  High-resolution structure prediction and the crystallographic phase problem , 2007, Nature.

[37]  Christopher J. Williams,et al.  Macromolecular structure determination using X-rays, neutrons and electrons: recent developments in Phenix , 2019, Acta crystallographica. Section D, Structural biology.

[38]  A. Lupas,et al.  High‐accuracy protein structure prediction in CASP14 , 2021, Proteins.

[39]  Randy J. Read,et al.  Phaser crystallographic software , 2007, Journal of applied crystallography.

[40]  John D. Hunter,et al.  Matplotlib: A 2D Graphics Environment , 2007, Computing in Science & Engineering.

[41]  Randy J Read,et al.  Evaluation of model refinement in CASP13 , 2019, Proteins.