Assessing the utility of CASP14 models for molecular replacement

The assessment of CASP models for utility in molecular replacement is a measure of their use in a valuable real-world application. In CASP7, the metric for molecular replacement assessment involved full likelihood-based molecular replacement searches; however, this restricted the assessable targets to crystal structures with only one copy of the target in the asymmetric unit, and to those where the search found the correct pose. In CASP10, full molecular replacement searches were replaced by likelihood-based rigid-body refinement of models superimposed on the target using the LGA algorithm, with the metric being the refined likelihood (LLG) score. This enabled multi-copy targets and very poor models to be evaluated, but a significant further issue remained: the requirement of diffraction data for assessment. We introduce here the relative-expected-LLG (reLLG), which is independent of diffraction data. This reLLG is also independent of any crystal form, and can be calculated regardless of the source of the target, be it X-ray, NMR or cryo-EM. We calibrate the reLLG against the LLG for targets in CASP14, showing that it is a robust measure of both model and group ranking. Like the LLG, the reLLG shows that accurate coordinate error estimates add substantial value to predicted models. We find that refinement by CASP groups can often convert an inadequate initial model into a successful MR search model. Consistent with findings from others, we show that the AlphaFold2 models are sufficiently good, and reliably so, to surpass other current model generation strategies for attempting molecular replacement phasing.

[1]  M. Topf,et al.  Cryo‐EM targets in CASP14 , 2021, Proteins.

[2]  N. Grishin,et al.  Target classification in the 14th round of the critical assessment of protein structure prediction (CASP14) , 2021, Proteins.

[3]  G. Makhatadze Faculty Opinions recommendation of Accurate prediction of protein structures and interactions using a three-track neural network. , 2021, Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature.

[4]  Minkyung Baek,et al.  Protein tertiary structure prediction and refinement using deep learning and Rosetta in CASP14 , 2021, Proteins.

[5]  Oriol Vinyals,et al.  Highly accurate protein structure prediction with AlphaFold , 2021, Nature.

[6]  A. Lupas,et al.  High‐accuracy protein structure prediction in CASP14 , 2021, Proteins.

[7]  M. Feig,et al.  Physics‐based protein structure refinement in the era of artificial intelligence , 2021, Proteins.

[8]  Gyu Rie Lee,et al.  Accurate prediction of protein structures and interactions using a 3-track neural network , 2021, Science.

[9]  R. Read,et al.  Possible Implications of AlphaFold2 for Crystallographic Phasing by Molecular Replacement , 2021, bioRxiv.

[10]  Filomeno Sánchez Rodríguez,et al.  Evaluation of model refinement in CASP14 , 2021, Proteins.

[11]  R. Read,et al.  Phasertng: directed acyclic graphs for crystallographic phasing , 2021, Acta crystallographica. Section D, Structural biology.

[12]  Jeff Reback,et al.  pandas-dev/pandas: Pandas 1.1.2 , 2020 .

[13]  Jaime Fern'andez del R'io,et al.  Array programming with NumPy , 2020, Nature.

[14]  Wah Chiu,et al.  Cryo‐electron microscopy targets in CASP13: Overview and evaluation of results , 2019, Proteins.

[15]  Christopher J. Williams,et al.  Macromolecular structure determination using X-rays, neutrons and electrons: recent developments in Phenix , 2019, Acta crystallographica. Section D, Structural biology.

[16]  Ekaba Bisong,et al.  Building Machine Learning and Deep Learning Models on Google Cloud Platform: A Comprehensive Guide for Beginners , 2019 .

[17]  Randy J Read,et al.  Evaluation of model refinement in CASP13 , 2019, Proteins.

[18]  Randy J Read,et al.  Evaluation of template‐based modeling in CASP13 , 2019, Proteins.

[19]  Randy J Read,et al.  On the application of the expected log-likelihood gain to decision making in molecular replacement , 2018, Acta crystallographica. Section D, Structural biology.

[20]  Ronan M Keegan,et al.  Recent developments in MrBUMP: better search-model preparation, graphical interaction with search models, and solution improvement and assessment , 2018, Acta crystallographica. Section D, Structural biology.

[21]  R. Read,et al.  Ab initio solution of macromolecular crystal structures without direct methods , 2017, Proceedings of the National Academy of Sciences.

[22]  Randy J Read,et al.  A log-likelihood-gain intensity target for crystallographic phasing that accounts for experimental error , 2016, Acta crystallographica. Section D, Structural biology.

[23]  Randy J. Read,et al.  Local Error Estimates Dramatically Improve the Utility of Homology Models for Solving Crystal Structures by Molecular Replacement , 2015, Structure.

[24]  R. Read,et al.  Improved estimates of coordinate error for molecular replacement , 2013, Acta crystallographica. Section D, Biological crystallography.

[25]  David T Jones,et al.  Evaluation of predictions in the CASP10 model refinement category , 2013, Proteins.

[26]  Ronan M Keegan,et al.  AMPLE: a cluster-and-truncate approach to solve the crystal structures of small proteins using rapidly computed ab initio models. , 2012, Acta crystallographica. Section D, Biological crystallography.

[27]  Janusz M. Bujnicki,et al.  The utility of comparative models and the local model quality for protein crystal structure determination by Molecular Replacement , 2012, BMC Bioinformatics.

[28]  Randy J. Read,et al.  Overview of the CCP4 suite and current developments , 2011, Acta crystallographica. Section D, Biological crystallography.

[29]  J. Yoon,et al.  Helicobacter pylori proinflammatory protein up-regulates NF-κB as a cell-translocating Ser/Thr kinase , 2010, Proceedings of the National Academy of Sciences.

[30]  G. Sheldrick,et al.  Crystallographic ab initio protein structure solution below atomic resolution , 2009, Nature Methods.

[31]  P. Bradley,et al.  High-resolution structure prediction and the crystallographic phase problem , 2007, Nature.

[32]  Randy J. Read,et al.  Phaser crystallographic software , 2007, Journal of applied crystallography.

[33]  John D. Hunter,et al.  Matplotlib: A 2D Graphics Environment , 2007, Computing in Science & Engineering.

[34]  Haruki Nakamura,et al.  The worldwide Protein Data Bank (wwPDB): ensuring a single, uniform archive of PDB data , 2006, Nucleic Acids Res..

[35]  J. Skolnick,et al.  TM-align: a protein structure alignment algorithm based on the TM-score , 2005, Nucleic acids research.

[36]  Adam Zemla,et al.  LGA: a method for finding 3D similarities in protein structures , 2003, Nucleic Acids Res..

[37]  R. Read Structure-factor probabilities for related structures , 1990 .

[38]  A. Lesk,et al.  The relation between the divergence of sequence and structure in proteins. , 1986, The EMBO journal.

[39]  A. Wilson,et al.  The probability distribution of X-ray intensities , 1949 .

[40]  et al.,et al.  Jupyter Notebooks - a publishing format for reproducible computational workflows , 2016, ELPUB.

[41]  Krzysztof Fidelis,et al.  CASP9 results compared to those of previous casp experiments , 2011, Proteins.

[42]  Randy J Read,et al.  Assessment of CASP7 predictions in the high accuracy template‐based modeling category , 2007, Proteins.

[43]  M. Heel,et al.  Exact filters for general geometry three dimensional reconstruction , 1986 .