Classification of crystallization outcomes using deep convolutional neural networks

The Machine Recognition of Crystallization Outcomes (MARCO) initiative has assembled roughly half a million annotated images of macromolecular crystallization experiments from various sources and setups. Here, state-of-the-art machine learning algorithms are trained and tested on different parts of this data set. We find that more than 94% of the test images can be correctly labeled, irrespective of their experimental origin. Because crystal recognition is key to high-density screening and the systematic analysis of crystallization experiments, this approach opens the door to both industrial and fundamental research applications.

[1]  N. S. Barnett,et al.  Private communication , 1969 .

[2]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[3]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[4]  P. Cutshall Lessons for the future. , 1998, Nursing BC.

[5]  A. McPherson Crystallization of Biological Macromolecules , 1999 .

[6]  Glen Spraggon,et al.  Computational analysis of crystallization trials. , 2002, Acta crystallographica. Section D, Biological crystallography.

[7]  Igor Jurisica,et al.  Automatic classification of sub-microlitre protein-crystallization trials in 1536-well plates. , 2003, Acta crystallographica. Section D, Biological crystallography.

[8]  Naomi E Chayen,et al.  Turning protein crystallisation from an art into a science. , 2004, Current opinion in structural biology.

[9]  Peter Kuhn,et al.  Automatic classification of protein crystallization images using a curve‐tracking algorithm , 2004 .

[10]  Igor Jurisica,et al.  Automatic Classification and Pattern Discovery in High-throughput Protein Crystallization Trials , 2005, Journal of Structural and Functional Genomics.

[11]  Hajime Asama,et al.  Evaluation of protein crystallization states based on texture information derived from greyscale images. , 2005 .

[12]  Petra Perner,et al.  Proceedings of the 6th Industrial Conference on Data Mining conference on Advances in Data Mining: applications in Medicine, Web Mining, Marketing, Image and Signal Mining , 2006 .

[13]  Dong Hui Xu,et al.  Automated classification of protein crystallization images using support vector machines with scale-invariant texture and Gabor features. , 2006, Acta crystallographica. Section D, Biological crystallography.

[14]  Julie Wilson Automated Classification of Images from Crystallisation Experiments , 2006, Industrial Conference on Data Mining.

[15]  Samarasena Buchala,et al.  Improved classification of crystallization images using data fusion and multiple classifiers. , 2008, Acta crystallographica. Section D, Biological crystallography.

[16]  Igor Jurisica,et al.  Establishing a training set through the visual analysis of crystallization trials. Part I: ∼150 000 images , 2008, Acta crystallographica. Section D, Biological crystallography.

[17]  Taketoshi Mishima,et al.  Evaluation of protein crystallization state by sequential image classification , 2008 .

[18]  Igor Jurisica,et al.  Establishing a training set through the visual analysis of crystallization trials. Part II: crystal examples , 2008, Acta crystallographica. Section D, Biological crystallography.

[19]  Raymond M Nagel,et al.  The application and use of chemical space mapping to interpret crystallization screening results , 2008, Acta crystallographica. Section D, Biological crystallography.

[20]  Yoav Freund,et al.  Image-based crystal detection: a machine-learning approach , 2008, Acta crystallographica. Section D, Biological crystallography.

[21]  Andrew F. Laine,et al.  Leveraging genetic algorithm and neural network in automated protein crystal recognition , 2008, 2008 30th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[22]  Igor Jurisica,et al.  Protein crystallization analysis on the World Community Grid , 2009, Journal of Structural and Functional Genomics.

[23]  Changming Sun,et al.  DroplIT, an improved image analysis method for droplet identification in high-throughput crystallization trials , 2010 .

[24]  Léon Bottou,et al.  Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.

[25]  Randy J. Read,et al.  Overview of the CCP4 suite and current developments , 2011, Acta crystallographica. Section D, Biological crystallography.

[26]  Y. Thielmann,et al.  The ESFRI Instruct Core Centre Frankfurt: automated high-throughput crystallization suited for membrane proteins and more , 2012, Journal of Structural and Functional Genomics.

[27]  Janet Newman,et al.  One plate, two plates, a thousand plates. How crystallisation changes with large numbers of samples. , 2011, Methods.

[28]  Marc'Aurelio Ranzato,et al.  Large Scale Distributed Deep Networks , 2012, NIPS.

[29]  Roger A. Sayle,et al.  On the need for an international effort to capture, share and use crystallization screening data , 2012, Acta crystallographica. Section F, Structural biology and crystallization communications.

[30]  Igor Jurisica,et al.  High-throughput protein crystallization on the World Community Grid and the GPU , 2012 .

[31]  Katarina Mele,et al.  Quantifying the quality of the experiments used to grow protein crystals: the iQC suite , 2014 .

[32]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[33]  Andrew E. Bruno,et al.  Statistical Analysis of Crystallization Database Links Protein Physico-Chemical Features with Crystallization Mechanisms , 2013, PloS one.

[34]  John Collins,et al.  Protein crystallization image classification with elastic net , 2014, Medical Imaging.

[35]  J. Newman,et al.  Using Time Courses To Enrich the Information Obtained from Images of Crystallization Trials , 2014 .

[36]  Victor S. Lempitsky,et al.  Unsupervised Domain Adaptation by Backpropagation , 2014, ICML.

[37]  J. Newman,et al.  Crystallization: digging into the past to learn lessons for the future. , 2015, Methods in molecular biology.

[38]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[39]  George Trigeorgis,et al.  Domain Separation Networks , 2016, NIPS.

[40]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  P. Charbonneau,et al.  Computational crystallization. , 2015, Archives of biochemistry and biophysics.

[42]  P. Charbonneau,et al.  Soft matter perspective on protein crystal assembly. , 2015, Colloids and surfaces. B, Biointerfaces.

[43]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[44]  Yichuan Tang,et al.  Learning Deep Convolutional Neural Networks for X-Ray Protein Crystallization Image Analysis , 2016, AAAI.

[45]  O. Stegle,et al.  Deep learning for computational biology , 2016, Molecular systems biology.

[46]  J. Ng,et al.  Lessons from ten years of crystallization experiments at the SGC , 2016, Acta crystallographica. Section D, Structural biology.

[47]  Bram van Ginneken,et al.  A survey on deep learning in medical image analysis , 2017, Medical Image Anal..

[48]  Meir Glick,et al.  Extending 'predict first' to the design-make-test cycle in small-molecule drug discovery. , 2017, Future medicinal chemistry.

[49]  Aleksey Boyko,et al.  Detecting Cancer Metastases on Gigapixel Pathology Images , 2017, ArXiv.

[50]  D. Sculley,et al.  Google Vizier: A Service for Black-Box Optimization , 2017, KDD.

[51]  Sergey Ioffe,et al.  Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[52]  Zenghui Wang,et al.  Deep Convolutional Neural Networks for Image Classification: A Comprehensive Review , 2017, Neural Computation.

[53]  Shuheng Zhang,et al.  Microfluidic platform for optimization of crystallization conditions , 2017 .

[54]  Martin Wattenberg,et al.  SmoothGrad: removing noise by adding noise , 2017, ArXiv.

[55]  Jonathan Krause,et al.  Grader variability and the importance of reference standards for evaluating machine learning models for diabetic retinopathy , 2017, Ophthalmology.

[56]  Vijay Vasudevan,et al.  Learning Transferable Architectures for Scalable Image Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[57]  Marko Ristic,et al.  Cinder: keeping crystallographers app-y. , 2018, Acta crystallographica. Section F, Structural biology communications.