Data Analysis Approaches for High Content Screening

Experimental methods in cell biology offer an excellent combination of access to detailed biologic information together with manageable cost. Recent technological advances have further enhanced this capacity, allowing the interrogation of cells in the automated high throughput mode that is necessary in the pharmaceutical industry. While the technology itself has advanced tremendously, accompanying statistical data analysis as often practiced has lagged behind. In order to address this issue, we embarked on a program whose goals are the identification of key aspects of data analysis and the development of appropriate statistical tools. We report several methods we have adopted that were found to be general and broadly applicable together with examples in which they were used. These include: the representation of cell-level information through quantiles of population distributions, capture of discrete processes and cell subpopulations by mixture modeling, construction of quantile scores and composite decision rules for analysis comprising multiple criteria, and robust multivariate supervised classification for hit selection with discovery of important features. No two studies are identical, however, and an array of statistical methods is needed, simple as well as complex, together with a flexible approach to carefully match the tools to the scientific questions at hand.

[1]  N. Stanietsky,et al.  The interaction of TIGIT with PVR and PVRL2 inhibits human NK cell cytotoxicity , 2009, Proceedings of the National Academy of Sciences.

[2]  Ting Wang,et al.  Application of Breiman's Random Forest to Modeling Structure-Activity Relationships of Pharmaceutical Molecules , 2004, Multiple Classifier Systems.

[3]  Jill P. Mesirov,et al.  Automated High-Dimensional Flow Cytometric Data Analysis , 2010, RECOMB.

[4]  Mary Ellen Cvijic,et al.  A High-Content Glucocorticoid Receptor Translocation Assay for Compound Mechanism-of-Action Evaluation , 2007, Journal of biomolecular screening.

[5]  I. Kariv,et al.  High-Throughput Analysis of HGF-Stimulated Cell Scattering , 2008, Journal of biomolecular screening.

[6]  Charles Y. Tao,et al.  A Support Vector Machine Classifier for Recognizing Mitotic Subphases Using High-Content Screening Data , 2007, Journal of biomolecular screening.

[7]  Steven A. Haney,et al.  High content screening : science, techniques and applications , 2008 .

[8]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[9]  S. Bray Notch signalling: a simple pathway becomes complex , 2006, Nature Reviews Molecular Cell Biology.

[10]  Jing Zhang,et al.  High-content screening moves to the front of the line. , 2006, Drug discovery today.

[11]  Kenneth A. Giuliano,et al.  High content screening : a powerful approach to systems cell biology and drug discovery , 2007 .

[12]  Lani F. Wu,et al.  Image-based multivariate profiling of drug responses from single cells , 2007, Nature Methods.

[13]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[14]  Adrian E. Raftery,et al.  MCLUST Version 3: An R Package for Normal Mixture Modeling and Model-Based Clustering , 2006 .