Machine Learning Improves the Precision and Robustness of High-Content Screens

Imaging-based high-content screens often rely on single cell-based evaluation of phenotypes in large data sets of microscopic images. Traditionally, these screens are analyzed by extracting a few image-related parameters and use their ratios (linear single or multiparametric separation) to classify the cells into various phenotypic classes. In this study, the authors show how machine learning–based classification of individual cells outperforms those classical ratio-based techniques. Using fluorescent intensity and morphological and texture features, they evaluated how the performance of data analysis increases with increasing feature numbers. Their findings are based on a case study involving an siRNA screen monitoring nucleoplasmic and nucleolar accumulation of a fluorescently tagged reporter protein. For the analysis, they developed a complete analysis workflow incorporating image segmentation, feature extraction, cell classification, hit detection, and visualization of the results. For the classification task, the authors have established a new graphical framework, the Advanced Cell Classifier, which provides a very accurate high-content screen analysis with minimal user interaction, offering access to a variety of advanced machine learning methods.

[1]  David G. Stork,et al.  Pattern Classification , 1973 .

[2]  Pat Langley,et al.  Estimating Continuous Distributions in Bayesian Classifiers , 1995, UAI.

[3]  Jing Liu,et al.  Experimental Design and Statistical Methods for Improved Hit Detection in High-Throughput Screening , 2010, Journal of biomolecular screening.

[4]  Ruth R. Montgomery,et al.  RNA interference screen for human genes associated with West Nile virus infection , 2008, Nature.

[5]  Peter Horvath,et al.  A Protein Inventory of Human Ribosome Biogenesis Reveals an Essential Function of Exportin 5 in 60S Subunit Export , 2010, PLoS biology.

[6]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[7]  Polina Golland,et al.  CellProfiler Analyst: data exploration and analysis software for complex image-based screens , 2008, BMC Bioinformatics.

[8]  Pauli Rämö,et al.  CellClassifier: supervised learning of cellular phenotypes , 2009, Bioinform..

[9]  Anne E Carpenter,et al.  CellProfiler: image analysis software for identifying and quantifying cell phenotypes , 2006, Genome Biology.

[10]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[11]  Anne Kümmel,et al.  Integration of Multiple Readouts into the Z' Factor for Assay Quality Assessment , 2010, Journal of biomolecular screening.

[12]  Thomas D. Y. Chung,et al.  A Simple Statistical Parameter for Use in Evaluation and Validation of High Throughput Screening Assays , 1999, Journal of biomolecular screening.

[13]  David G. Stork,et al.  Pattern classification, 2nd Edition , 2000 .

[14]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[15]  Ulrike Kutay,et al.  Distinct cytoplasmic maturation steps of 40S ribosomal subunit precursors require hRio2 , 2009, The Journal of cell biology.

[16]  Péter Horváth,et al.  Enhanced CellClassifier: a multi-class classification tool for microscopy images , 2010, BMC Bioinformatics.

[17]  Ed Hurt,et al.  Pre-ribosomes on the road from the nucleolus to the cytoplasm. , 2003, Trends in cell biology.