Phenotype Recognition with Combined Features and Random Subspace Classifier Ensemble

BackgroundAutomated, image based high-content screening is a fundamental tool for discovery in biological science. Modern robotic fluorescence microscopes are able to capture thousands of images from massively parallel experiments such as RNA interference (RNAi) or small-molecule screens. As such, efficient computational methods are required for automatic cellular phenotype identification capable of dealing with large image data sets. In this paper we investigated an efficient method for the extraction of quantitative features from images by combining second order statistics, or Haralick features, with curvelet transform. A random subspace based classifier ensemble with multiple layer perceptron (MLP) as the base classifier was then exploited for classification. Haralick features estimate image properties related to second-order statistics based on the grey level co-occurrence matrix (GLCM), which has been extensively used for various image processing applications. The curvelet transform has a more sparse representation of the image than wavelet, thus offering a description with higher time frequency resolution and high degree of directionality and anisotropy, which is particularly appropriate for many images rich with edges and curves. A combined feature description from Haralick feature and curvelet transform can further increase the accuracy of classification by taking their complementary information. We then investigate the applicability of the random subspace (RS) ensemble method for phenotype classification based on microscopy images. A base classifier is trained with a RS sampled subset of the original feature set and the ensemble assigns a class label by majority voting.ResultsExperimental results on the phenotype recognition from three benchmarking image sets including HeLa, CHO and RNAi show the effectiveness of the proposed approach. The combined feature is better than any individual one in the classification accuracy. The ensemble model produces better classification performance compared to the component neural networks trained. For the three images sets HeLa, CHO and RNAi, the Random Subspace Ensembles offers the classification rates 91.20%, 98.86% and 91.03% respectively, which compares sharply with the published result 84%, 93% and 82% from a multi-purpose image classifier WND-CHARM which applied wavelet transforms and other feature extraction methods. We investigated the problem of estimation of ensemble parameters and found that satisfactory performance improvement could be brought by a relative medium dimensionality of feature subsets and small ensemble size.ConclusionsThe characteristics of curvelet transform of being multiscale and multidirectional suit the description of microscopy images very well. It is empirically demonstrated that the curvelet-based feature is clearly preferred to wavelet-based feature for bioimage descriptions. The random subspace ensemble of MLPs is much better than a number of commonly applied multi-class classifiers in the investigated application of phenotype recognition.

[1]  David A Clausi An analysis of co-occurrence texture statistics as a function of grey level quantization , 2002 .

[2]  Hanchuan Peng,et al.  Bioimage informatics: a new area of engineering biology , 2008, Bioinform..

[3]  Juan José Rodríguez Diez,et al.  Random Subspace Ensembles for fMRI Classification , 2010, IEEE Transactions on Medical Imaging.

[4]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[5]  Sebastian Nowozin,et al.  On feature combination for multiclass object classification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[6]  Lin Ni,et al.  Curvelet transform and its application in image retrieval , 2003, International Symposium on Multispectral Image Processing and Pattern Recognition.

[7]  David L. Donoho,et al.  Digital curvelet transform: strategy, implementation, and experiments , 2000, SPIE Defense + Commercial Sensing.

[8]  T. Mitchison,et al.  Phenotypic screening of small molecule libraries by high throughput cell imaging. , 2003, Combinatorial chemistry & high throughput screening.

[9]  Emmanuel J. Candès,et al.  The curvelet transform for image denoising , 2002, IEEE Trans. Image Process..

[10]  Rabab K. Ward,et al.  Multiresolution Methods in Face Recognition , 2008 .

[11]  Ludmila I. Kuncheva,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2004 .

[12]  Norbert Perrimon,et al.  Genome-Wide RNAi Screen for Host Factors Required for Intracellular Bacterial Infection , 2005, Science.

[13]  G. Hannon RNA interference : RNA , 2002 .

[14]  Kai Huang,et al.  Boosting accuracy of automated classification of fluorescence microscope images for location proteomics , 2004, BMC Bioinformatics.

[15]  Goro Obinata,et al.  Vision Systems: Segmentation and Pattern Recognition , 2007 .

[16]  Lior Shamir,et al.  IICBU 2008: a proposed benchmark suite for biological image analysis , 2008, Medical & Biological Engineering & Computing.

[17]  N. Perrimon,et al.  High-throughput RNAi screening in cultured cells: a user's guide , 2006, Nature Reviews Genetics.

[18]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[19]  Loris Nanni,et al.  Fusion of systems for automated cell phenotype image classification , 2010, Expert Syst. Appl..

[20]  Lior Shamir,et al.  Computer Vision for Microscopy Applications , 2007 .

[21]  Lani F. Wu,et al.  Image-based multivariate profiling of drug responses from single cells , 2007, Nature Methods.

[22]  Mohamed S. Kamel,et al.  Random Subspace Method in Text Categorization , 2010, 2010 20th International Conference on Pattern Recognition.

[23]  Subhash C. Bagui,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2005, Technometrics.

[24]  Ingrid Daubechies,et al.  The wavelet transform, time-frequency localization and signal analysis , 1990, IEEE Trans. Inf. Theory.

[25]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[26]  J. C. Clemens,et al.  Use of double-stranded RNA interference in Drosophila cell lines to dissect signal transduction pathways. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[27]  M V Boland,et al.  Automated recognition of patterns characteristic of subcellular structures in fluorescence microscopy images. , 1998, Cytometry.

[28]  Polina Golland,et al.  CellProfiler Analyst: data exploration and analysis software for complex image-based screens , 2008, BMC Bioinformatics.

[29]  Nicholas A. Hamilton,et al.  Fast automated cell phenotype image classification , 2007, BMC Bioinformatics.

[30]  Petros Koumoutsakos,et al.  Edge detection in microscopy images using curvelets , 2009, BMC Bioinformatics.

[31]  Lior Shamir,et al.  Source Code for Biology and Medicine Open Access Wndchrm – an Open Source Utility for Biological Image Analysis , 2022 .

[32]  Laurent Demanet,et al.  Fast Discrete Curvelet Transforms , 2006, Multiscale Model. Simul..

[33]  David L. Donoho,et al.  Curvelets, multiresolution representation, and scaling laws , 2000, SPIE Optics + Photonics.

[34]  Md. Monirul Islam,et al.  Content based image retrieval using curvelet transform , 2008, 2008 IEEE 10th Workshop on Multimedia Signal Processing.

[35]  R. Rosenfeld Nature , 2009, Otolaryngology--head and neck surgery : official journal of American Academy of Otolaryngology-Head and Neck Surgery.

[36]  N. Perrimon,et al.  Genome-Wide RNAi Analysis of Growth and Viability in Drosophila Cells , 2004, Science.

[37]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[38]  Marian Stewart Bartlett,et al.  Recent Advances in Face Recognition , 2008 .

[39]  Prashant Parikh A Theory of Communication , 2010 .

[40]  Lani F. Wu,et al.  Multidimensional Drug Profiling By Automated Microscopy , 2004, Science.

[41]  Robert F. Murphy,et al.  A neural network classifier capable of recognizing the patterns of all major subcellular structures in fluorescence microscope images of HeLa cells , 2001, Bioinform..

[42]  Lior Shamir,et al.  An image informatics method for automated quantitative analysis of phenotype visual similarities , 2009, 2009 IEEE/NIH Life Science Systems and Applications Workshop.

[43]  Joshua M. Stuart,et al.  Information-based methods for predicting gene function from systematic gene knock-downs , 2008, BMC Bioinformatics.

[44]  Jelena Kovacevic,et al.  A multiresolution approach to automated classification of protein subcellular location images , 2007, BMC Bioinformatics.

[45]  Hanchuan Peng,et al.  Automatic recognition and annotation of gene expression patterns of fly embryos , 2007, Bioinform..

[46]  M.,et al.  Statistical and Structural Approaches to Texture , 2022 .

[47]  Gerlind Plonka-Hoch,et al.  The Curvelet Transform , 2010, IEEE Signal Processing Magazine.

[48]  Stephen T. C. Wong,et al.  Cellular Phenotype Recognition for High-Content RNA Interference Genome-Wide Screening , 2008, Journal of biomolecular screening.

[49]  Anne E Carpenter,et al.  Systematic genome-wide screens of gene function , 2004, Nature Reviews Genetics.

[50]  Vladimir Cherkassky,et al.  The Nature Of Statistical Learning Theory , 1997, IEEE Trans. Neural Networks.

[51]  Leen-Kiat Soh,et al.  Texture analysis of SAR sea ice imagery using gray level co-occurrence matrices , 1999, IEEE Trans. Geosci. Remote. Sens..

[52]  R. Murphy,et al.  A framework for the automated analysis of subcellular patterns in human protein atlas images. , 2008, Journal of proteome research.

[53]  John F. G. Atack,et al.  RNA Interference , 2010, Methods in Molecular Biology.

[54]  B. S. Manjunath,et al.  Texture Features for Browsing and Retrieval of Image Data , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[55]  Q. M. Jonathan Wu,et al.  Curvelet based face recognition via dimension reduction , 2009, Signal Process..