Exploratory analysis of cell-based screening data for phenotype identification in drug-siRNA study

Most phenotype-identification methods in cell-based screening assume prior knowledge about expected phenotypes or involve intricate parameter-setting. They are useful for analysis targeting known phenotype properties; but need exists to explore, with minimum presumptions, the potentially-interesting phenotypes derivable from data. We present a method for this exploration, using clustering to eliminate phenotype-labelling requirement and GUI visualisation to facilitate parameter-setting. The steps are: outlier-removal, cell clustering and interactive visualisation for phenotypes refinement. For drug-siRNA study, we introduce an auto-merging procedure to reduce phenotype redundancy. We validated the method on two Golgi apparatus screens and showcase its contribution for better understanding of screening-images.

[1]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[2]  Lit-Hsin Loo,et al.  An approach for extensibly profiling the molecular states of cellular subpopulations , 2009, Nature Methods.

[3]  Manuel Théry,et al.  Anisotropy of cell adhesive microenvironment governs cell internal organization and orientation of polarity , 2006, Proceedings of the National Academy of Sciences.

[4]  J. Vermunt,et al.  Latent class cluster analysis , 2002 .

[5]  Bin Zhang,et al.  Defining clusters from a hierarchical cluster tree: the Dynamic Tree Cut package for R , 2008, Bioinform..

[6]  M. Boutros,et al.  Clustering phenotype populations by genome-wide RNAi and multiparametric imaging , 2010, Molecular systems biology.

[7]  Lani F. Wu,et al.  Characterizing heterogeneous cellular responses to perturbations , 2008, Proceedings of the National Academy of Sciences.

[8]  Zhiyong Lu,et al.  Automatic Extraction of Clusters from Hierarchical Clustering Representations , 2003, PAKDD.

[9]  Robert Clarke,et al.  VISDA: an open-source caBIGTM analytical tool for data clustering and beyond , 2007, Bioinform..

[10]  Daniel Stahl,et al.  Latent Cluster Analysis of ALS Phenotypes Identifies Prognostically Differing Groups , 2009, PloS one.

[11]  Roland Eils,et al.  Reliability of gene expression ratios for cDNA microarrays in multiconditional experiments with a reference design. , 2004, Nucleic acids research.

[12]  Lior Shamir,et al.  WND-CHARM: Multi-purpose image classification using compound image transforms , 2008, Pattern Recognit. Lett..

[13]  Xiaobo Zhou,et al.  Using iterative cluster merging with improved gap statistics to perform online phenotype discovery in the context of high-throughput RNAi screens , 2008, BMC Bioinformatics.

[14]  Hanchuan Peng,et al.  Phenotype clustering of breast epithelial cells in confocal images based on nuclear protein distribution analysis , 2006, BMC Cell Biology.

[15]  Elvira García Osuna,et al.  Large-Scale Automated Analysis of Location Patterns in Randomly Tagged 3T3 Cells , 2007, Annals of Biomedical Engineering.

[16]  C. Bakal,et al.  Quantitative Morphological Signatures Define Local Signaling Networks Regulating Cell Morphology , 2007, Science.

[17]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.

[18]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[19]  Radosav S. Pantelic,et al.  Automated sub-cellular phenotype classification: an introduction and recent results , 2006 .

[20]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.