Protein crystallization analysis on the World Community Grid

We have developed an image-analysis and classification system for automatically scoring images from high-throughput protein crystallization trials. Image analysis for this system is performed by the Help Conquer Cancer (HCC) project on the World Community Grid. HCC calculates 12,375 distinct image features on microbatch-under-oil images from the Hauptman-Woodward Medical Research Institute’s High-Throughput Screening Laboratory. Using HCC-computed image features and a massive training set of 165,351 hand-scored images, we have trained multiple Random Forest classifiers that accurately recognize multiple crystallization outcomes, including crystals, clear drops, precipitate, and others. The system successfully recognizes 80% of crystal-bearing images, 89% of precipitate images, and 98% of clear drops.

[1]  Joseph R Luft,et al.  A deliberate approach to screening for initial crystallization conditions of biological macromolecules. , 2003, Journal of structural biology.

[2]  Igor Jurisica,et al.  Establishing a training set through the visual analysis of crystallization trials. Part II: crystal examples , 2008, Acta crystallographica. Section D, Biological crystallography.

[3]  Glen Spraggon,et al.  Computational analysis of crystallization trials. , 2002, Acta crystallographica. Section D, Biological crystallography.

[4]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[5]  Andrew F. Laine,et al.  Leveraging genetic algorithm and neural network in automated protein crystal recognition , 2008, 2008 30th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[6]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[7]  Raymond M Nagel,et al.  AutoSherlock: a program for effective crystallization data analysis. , 2008, Journal of applied crystallography.

[8]  Julie Wilson,et al.  Towards the automated evaluation of crystallization trials. , 2002, Acta crystallographica. Section D, Biological crystallography.

[9]  Dong Hui Xu,et al.  Automated classification of protein crystallization images using support vector machines with scale-invariant texture and Gabor features. , 2006, Acta crystallographica. Section D, Biological crystallography.

[10]  Hajime Asama,et al.  Evaluation of protein crystallization states based on texture information derived from greyscale images. , 2005 .

[11]  Robert M. Haralick,et al.  Textural Features for Image Classification , 1973, IEEE Trans. Syst. Man Cybern..

[12]  Yoav Freund,et al.  Image-based crystal detection: a machine-learning approach , 2008, Acta crystallographica. Section D, Biological crystallography.

[13]  Igor Jurisica,et al.  Automatic Classification and Pattern Discovery in High-throughput Protein Crystallization Trials , 2005, Journal of Structural and Functional Genomics.

[14]  Igor Jurisica,et al.  Establishing a training set through the visual analysis of crystallization trials. Part I: ∼150 000 images , 2008, Acta crystallographica. Section D, Biological crystallography.

[15]  Igor Jurisica,et al.  Automatic classification of sub-microlitre protein-crystallization trials in 1536-well plates. , 2003, Acta crystallographica. Section D, Biological crystallography.

[16]  Hajime Asama,et al.  Evaluation of crystalline objects in crystallizing protein droplets based on line-segment information in greyscale images. , 2006, Acta crystallographica. Section D, Biological crystallography.

[17]  Peter Kuhn,et al.  Automatic classification of protein crystallization images using a curve‐tracking algorithm , 2004 .