The predictive power of data-processing statistics

Combining data-analysis statistics from crystallographic software with machine learning can predict the chances of experimental phasing success.

[1]  A.J.C. Wilson,et al.  Largest likely values for the reliability index , 1950 .

[2]  Philip R. Evans,et al.  How good are my data and what is the resolution? , 2013, Acta crystallographica. Section D, Biological crystallography.

[3]  Kay Diederichs,et al.  Assessing and maximizing data quality in macromolecular crystallography. , 2015, Current opinion in structural biology.

[4]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[5]  George M. Sheldrick,et al.  Experimental phasing with SHELXC/D/E: combining chain tracing with density modification , 2010, Acta crystallographica. Section D, Biological crystallography.

[6]  Gwyndaf Evans,et al.  DIALS: implementation and evaluation of a new integration package , 2018, Acta crystallographica. Section D, Structural biology.

[7]  A. Wilson,et al.  Determination of Absolute from Relative X-Ray Intensity Data , 1942, Nature.

[8]  P. Andrew Karplus,et al.  Improved R-factors for diffraction data analysis in macromolecular crystallography , 1997, Nature Structural Biology.

[9]  A. Wilson,et al.  The probability distribution of X-ray intensities , 1949 .

[10]  Christopher J. Williams,et al.  Macromolecular structure determination using X-rays, neutrons and electrons: recent developments in Phenix , 2019, Acta crystallographica. Section D, Structural biology.

[11]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[12]  M. Jaskólski,et al.  Protein Crystallography , 2017, Methods in Molecular Biology.

[13]  J. Drenth Principles of protein x-ray crystallography , 1994 .

[14]  G. Sheldrick,et al.  An introduction to experimental phasing of macromolecules illustrated by SHELX; new autotracing features , 2018, Acta crystallographica. Section D, Structural biology.

[15]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[16]  P. Andrew Karplus,et al.  Linking Crystallographic Model and Data Quality , 2012, Science.

[17]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[18]  J. Bijvoet,et al.  Determination of the Absolute Configuration of Optically Active Compounds by Means of X-Rays , 1951, Nature.

[19]  Travis E. Oliphant,et al.  Guide to NumPy , 2015 .

[20]  A. W. Ashton,et al.  SynchWeb: a modern interface for ISPyB , 2015, Journal of applied crystallography.

[21]  Daniel T. Larose,et al.  Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .

[22]  R. Crowther,et al.  A computer-linked cathode-ray tube microdensitometer for x-ray crystallography. , 1968, Journal of scientific instruments.

[23]  R. Srinivasan,et al.  Some statistical applications in X-ray crystallography , 1975 .

[24]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[25]  Thomas C Terwilliger,et al.  Can I solve my structure by SAD phasing? Planning an experiment, scaling data and evaluating the useful anomalous correlation and anomalous signal , 2016, Acta crystallographica. Section D, Structural biology.

[26]  George M Sheldrick,et al.  Substructure solution with SHELXD. , 2002, Acta crystallographica. Section D, Biological crystallography.

[27]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[28]  Michael G. Rossmann,et al.  The single isomorphous replacement method , 1961 .

[29]  Manfred S. Weiss,et al.  Global indicators of X-ray data quality , 2001 .

[30]  P. Evans,et al.  Scaling and assessment of data quality. , 2006, Acta crystallographica. Section D, Biological crystallography.

[31]  John D. Hunter,et al.  Matplotlib: A 2D Graphics Environment , 2007, Computing in Science & Engineering.

[32]  B. Matthews Solvent content of protein crystals. , 1968, Journal of molecular biology.

[33]  J. Hanley,et al.  A method of comparing the areas under receiver operating characteristic curves derived from the same cases. , 1983, Radiology.

[34]  Travis E. Oliphant,et al.  Python for Scientific Computing , 2007, Computing in Science & Engineering.

[35]  Graeme Winter,et al.  xia2: an expert system for macromolecular crystallography data reduction , 2010 .

[36]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[37]  Kenneth A. Frankel,et al.  The minimum crystal size needed for a complete diffraction data set , 2010, Acta crystallographica. Section D, Biological crystallography.

[38]  Dorian Pyle,et al.  Data Preparation for Data Mining , 1999 .

[39]  Paul D. Adams,et al.  Can I solve my structure by SAD phasing? Anomalous signal in SAD phasing , 2016, Acta crystallographica. Section D, Structural biology.

[40]  Sebastian Raschka,et al.  MLxtend: Providing machine learning and data science utilities and extensions to Python's scientific computing stack , 2018, J. Open Source Softw..

[41]  M. Weiss,et al.  On the use of the merging R factor as a quality indicator for X-ray data , 1997 .

[42]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[43]  Randy J. Read,et al.  Overview of the CCP4 suite and current developments , 2011, Acta crystallographica. Section D, Biological crystallography.

[44]  P. Howell,et al.  Identification of heavy‐atom derivatives by normal probability methods , 1992 .

[45]  K. Diederichs,et al.  Better models by discarding data? , 2013, Acta crystallographica. Section D, Biological crystallography.

[46]  Vipin Kumar,et al.  Introduction to Data Mining , 2022, Data Mining and Machine Learning Applications.

[47]  Trevor Hastie,et al.  Multi-class AdaBoost ∗ , 2009 .