Protein crystallization image classification with elastic net

Protein crystallization plays a crucial role in pharmaceutical research by supporting the investigation of a protein’s molecular structure through X-ray diffraction of its crystal. Due to the rare occurrence of crystals, images must be manually inspected, a laborious process. We develop a solution incorporating a regularized, logistic regression model for automatically evaluating these images. Standard image features, such as shape context, Gabor filters and Fourier transforms, are first extracted to represent the heterogeneous appearance of our images. Then the proposed solution utilizes Elastic Net to select relevant features. Its L1-regularization mitigates the effects of our large dataset, and its L2- regularization ensures proper operation when the feature number exceeds the sample number. A two-tier cascade classifier based on naïve Bayes and random forest algorithms categorized the images. In order to validate the proposed method, we experimentally compare it with naïve Bayes, linear discriminant analysis, random forest, and their two-tier cascade classifiers, by 10-fold cross validation. Our experimental results demonstrate a 3-category accuracy of 74%, outperforming other models. In addition, Elastic Net better reduces the false negatives responsible for a high, domain specific risk. To the best of our knowledge, this is the first attempt to apply Elastic Net to classifying protein crystallization images. Performance measured on a large pharmaceutical dataset also fared well in comparison with those presented in the previous studies, while the reduction of the high-risk false negatives is promising.

[1]  E. S. de Paredes,et al.  Missed breast carcinoma: pitfalls and pearls. , 2003, Radiographics : a review publication of the Radiological Society of North America, Inc.

[2]  Christopher Krügel,et al.  Reducing errors in the anomaly-based detection of web-based attacks through the combined analysis of web requests and SQL queries , 2009, J. Comput. Secur..

[3]  George W. Irwin,et al.  Intelligent Computing in Signal Processing and Pattern Recognition: International Conference on Intelligent Computing, ICIC 2006Kunming, China, August, ... Notes in Control and Information Sciences) , 2006 .

[4]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[5]  G. Feher,et al.  Protein crystallization. , 1996, Annual review of physical chemistry.

[6]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[7]  Qiang Huo,et al.  High performance Chinese OCR based on Gabor features, discriminative feature extraction and model training , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[8]  David G. Stork,et al.  Pattern Classification , 1973 .

[9]  Carlo Tomasi,et al.  Good features to track , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Yuan F. Zheng,et al.  Image-Based Classification for Automating Protein Crystal Identification , 2006 .

[11]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[12]  Vicki Bruce,et al.  Face Recognition: From Theory to Applications , 1999 .

[13]  Stephen Kwek,et al.  Applying Support Vector Machines to Imbalanced Datasets , 2004, ECML.

[14]  Jitendra Malik,et al.  Shape Context: A New Descriptor for Shape Matching and Object Recognition , 2000, NIPS.

[15]  Peter Kuhn,et al.  Automatic classification of protein crystallization images using a curve‐tracking algorithm , 2004 .

[16]  Julie Wilson,et al.  Towards the automated evaluation of crystallization trials. , 2002, Acta crystallographica. Section D, Biological crystallography.

[17]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[18]  Taketoshi Mishima,et al.  Evaluation of protein crystallization state by sequential image classification , 2008 .

[19]  Christopher G. Harris,et al.  A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[20]  Hong Wei,et al.  Face Verification Using GaborWavelets and AdaBoost , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[21]  Igor Jurisica,et al.  Automatic Classification and Pattern Discovery in High-throughput Protein Crystallization Trials , 2005, Journal of Structural and Functional Genomics.

[22]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[23]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[24]  Igor Jurisica,et al.  Protein crystallization analysis on the World Community Grid , 2009, Journal of Structural and Functional Genomics.

[25]  Salem Nasri,et al.  Rotation invariant texture classification using Support Vector Machines , 2011, 2011 International Conference on Communications, Computing and Control Applications (CCCA).

[26]  David A Clausi An analysis of co-occurrence texture statistics as a function of grey level quantization , 2002 .

[27]  Robert M. Haralick,et al.  Textural Features for Image Classification , 1973, IEEE Trans. Syst. Man Cybern..

[28]  Leen-Kiat Soh,et al.  Texture analysis of SAR sea ice imagery using gray level co-occurrence matrices , 1999, IEEE Trans. Geosci. Remote. Sens..