Assessing and maximizing data quality in macromolecular crystallography.

The quality of macromolecular crystal structures depends, in part, on the quality and quantity of the data used to produce them. Here, we review recent shifts in our understanding of how to use data quality indicators to select a high resolution cutoff that leads to the best model, and of the potential to greatly increase data quality through the merging of multiple measurements from multiple passes of single crystals or from multiple crystals. Key factors supporting this shift are the introduction of more robust correlation coefficient based indicators of the precision of merged data sets as well as the recognition of the substantial useful information present in extensive amounts of data once considered too weak to be of value.

[1]  Garth J. Williams,et al.  Time-resolved serial crystallography captures high-resolution intermediates of photoactive yellow protein , 2014, Science.

[2]  John R. Helliwell,et al.  Experiences with archived raw diffraction images data: capturing cisplatin after chemical conversion of carboplatin in high salt conditions for a protein crystal , 2013, Journal of synchrotron radiation.

[3]  P. Karplus,et al.  In-house sulfur SAD phasing: a case study of the effects of data quality and resolution cutoffs. , 2006, Acta crystallographica. Section D, Biological crystallography.

[4]  Jimin Wang,et al.  Estimation of the quality of refined protein crystal structures , 2015, Protein science : a publication of the Protein Society.

[5]  Jimin Wang,et al.  Diamonds in the rough: a strong case for the inclusion of weak-intensity X-ray diffraction data. , 2014, Acta crystallographica. Section D, Biological crystallography.

[6]  W. Hendrickson,et al.  Multi-crystal anomalous diffraction for low-resolution macromolecular phasing. , 2011, Acta crystallographica. Section D, Biological crystallography.

[7]  Jimin Wang Inclusion of weak high-resolution X-ray data for improvement of a group II intron structure. , 2010, Acta crystallographica. Section D, Biological crystallography.

[8]  Wayne A. Hendrickson,et al.  Robust structural analysis of native biological macromolecules from multi-crystal anomalous diffraction data , 2013, Acta crystallographica. Section D, Biological crystallography.

[9]  Julia Brasch,et al.  Structures from Anomalous Diffraction of Native Biological Macromolecules , 2012, Science.

[10]  K. Diederichs,et al.  Better models by discarding data? , 2013, Acta crystallographica. Section D, Biological crystallography.

[11]  Kay Diederichs,et al.  Crystallographic Data and Model Quality. , 2016, Methods in molecular biology.

[12]  Lirong Chen,et al.  A multi-dataset data-collection strategy produces better diffraction data , 2011, Acta crystallographica. Section A, Foundations of crystallography.

[13]  P. Karplus,et al.  Structure of a Sedoheptulose 7-Phosphate Cyclase: ValA from Streptomyces hygroscopicus , 2014, Biochemistry.

[14]  Z Dauter,et al.  Anomalous signal of phosphorus used for phasing DNA oligomer: importance of data redundancy. , 2001, Acta crystallographica. Section D, Biological crystallography.

[15]  P. Evans,et al.  Scaling and assessment of data quality. , 2006, Acta crystallographica. Section D, Biological crystallography.

[16]  S. Darst,et al.  Phage T7 Gp2 inhibition of Escherichia coli RNA polymerase involves misappropriation of σ70 domain 1.1 , 2013, Proceedings of the National Academy of Sciences.

[17]  Thomas C Terwilliger,et al.  Archiving raw crystallographic data. , 2014, Acta crystallographica. Section D, Biological crystallography.

[18]  Manfred S. Weiss,et al.  Global indicators of X-ray data quality , 2001 .

[19]  G. Sheldrick,et al.  In-house measurement of the sulfur anomalous signal and its use for phasing. , 2003, Acta crystallographica. Section D, Biological crystallography.

[20]  Structure of the Mediator head module , 2013 .

[21]  J. Helliwell,et al.  The interdependence of wavelength, redundancy and dose in sulfur SAD experiments. , 2008, Acta crystallographica. Section D, Biological crystallography.

[22]  Randy J. Read,et al.  Acta Crystallographica Section D Biological , 2003 .

[23]  P. Andrew Karplus,et al.  Linking Crystallographic Model and Data Quality , 2012, Science.

[24]  M. Jaskólski,et al.  Protein crystallography for non‐crystallographers, or how to get the best (but not more) from published macromolecular structures , 2008, The FEBS journal.

[25]  Frank von Delft,et al.  Assessment of radiation damage behaviour in a large collection of empirically optimized datasets highlights the importance of unmeasured complicating effects , 2011, Journal of synchrotron radiation.

[26]  Sandor Brockhauser,et al.  Predicting the X-ray lifetime of protein crystals , 2013, Proceedings of the National Academy of Sciences.

[27]  P. Evans Resolving Some Old Problems in Protein Crystallography , 2012, Science.

[28]  Z. Dauter,et al.  Weak data do not make a free lunch, only a cheap meal. , 2014, Acta crystallographica. Section D, Biological crystallography.

[29]  Frank von Delft,et al.  Squeezing the most from every crystal: the fine details of data collection , 2013, Acta crystallographica. Section D, Biological crystallography.

[30]  Garth J. Williams,et al.  High-Resolution Protein Structure Determination by Serial Femtosecond Crystallography , 2012, Science.

[31]  J. Wang,et al.  STRUCTURAL BASIS FOR GROEL-ASSISTED PROTEIN FOLDING FROM THE CRYSTAL STRUCTURE OF (GROEL-KMGATP) 14 AT 2.0 ANGSTROM RESOLUTION , 2003 .

[32]  E. Baker,et al.  Structure and Function of Human Xylulokinase, an Enzyme with Important Roles in Carbohydrate Metabolism* , 2012, The Journal of Biological Chemistry.

[33]  Shaoxia Chen,et al.  Prevention of overfitting in cryo-EM structure determination , 2012, Nature Methods.

[34]  Philip R. Evans,et al.  An introduction to data reduction: space-group determination, scaling and intensity statistics , 2011, Acta crystallographica. Section D, Biological crystallography.

[35]  P. Karplus,et al.  Crystal structure of Escherichia coli SsuE: defining a general catalytic cycle for FMN reductases of the flavodoxin-like superfamily. , 2014, Biochemistry.

[36]  Fei Long,et al.  The PDB_REDO server for macromolecular structure model optimization , 2014, IUCrJ.

[37]  M. Weiss,et al.  On the use of the merging R factor as a quality indicator for X-ray data , 1997 .

[38]  F. Findeisen,et al.  Structure of a prokaryotic sodium channel pore reveals essential gating elements and an outer ion binding site common to eukaryotic channels. , 2014, Journal of molecular biology.

[39]  J. Tainer,et al.  The R-factor gap in macromolecular crystallography: an untapped potential for insights on accurate structures , 2014, The FEBS journal.

[40]  Kay Diederichs Quantifying instrument errors in macromolecular X-ray data sets. , 2010, Acta crystallographica. Section D, Biological crystallography.

[41]  George M Sheldrick,et al.  Substructure solution with SHELXD. , 2002, Acta crystallographica. Section D, Biological crystallography.

[42]  Randy J. Read,et al.  phenix.mr_rosetta: molecular replacement and model rebuilding with Phenix and Rosetta , 2012, Journal of Structural and Functional Genomics.

[43]  G. Sheldrick,et al.  In-house phase determination of the lima bean trypsin inhibitor: a low-resolution sulfur-SAD case. , 2003, Acta crystallographica. Section D, Biological crystallography.

[44]  Janet L. Smith,et al.  Use of massively multiple merged data for low-resolution S-SAD phasing and refinement of flavivirus NS1. , 2014, Acta crystallographica. Section D, Biological crystallography.

[45]  Thomas C. Terwilliger,et al.  Continuous mutual improvement of macromolecular structure models in the PDB and of X-ray crystallographic software: the dual role of deposited experimental data , 2014, Acta crystallographica. Section D, Biological crystallography.

[46]  Philip R. Evans,et al.  How good are my data and what is the resolution? , 2013, Acta crystallographica. Section D, Biological crystallography.

[47]  P. Andrew Karplus,et al.  Improved R-factors for diffraction data analysis in macromolecular crystallography , 1997, Nature Structural Biology.