Benchmarking of Multivariate Similarity Measures for High-Content Screening Fingerprints in Phenotypic Drug Discovery

High-content screening (HCS) is a powerful tool for drug discovery being capable of measuring cellular responses to chemical disturbance in a high-throughput manner. HCS provides an image-based readout of cellular phenotypes, including features such as shape, intensity, or texture in a highly multiplexed and quantitative manner. The corresponding feature vectors can be used to characterize phenotypes and are thus defined as HCS fingerprints. Systematic analyses of HCS fingerprints allow for objective computational comparisons of cellular responses. Such comparisons therefore facilitate the detection of different compounds with different phenotypic outcomes from high-throughput HCS campaigns. Feature selection methods and similarity measures, as a basis for phenotype identification and clustering, are critical for the quality of such computational analyses. We systematically evaluated 16 different similarity measures in combination with linear and nonlinear feature selection methods for their potential to capture biologically relevant image features. Nonlinear correlation-based similarity measures such as Kendall’s τ and Spearman’s ρ perform well in most evaluation scenarios, outperforming other frequently used metrics (such as the Euclidian distance). We also present four novel modifications of the connectivity map similarity that surpass the original version, in our experiments. This study provides a basis for generic phenotypic analysis in future HCS campaigns.

[1]  Anne Kümmel,et al.  Comparison of Multivariate Data Analysis Strategies for High-Content Screening , 2011, Journal of biomolecular screening.

[2]  Michael Mitzenmacher,et al.  Detecting Novel Associations in Large Data Sets , 2011, Science.

[3]  Andreas Bender,et al.  How Similar Are Similarity Searching Methods? A Principal Component Analysis of Molecular Descriptor Space , 2009, J. Chem. Inf. Model..

[4]  Trevor F. Cox,et al.  Metric multidimensional scaling , 2000 .

[5]  Lit-Hsin Loo,et al.  An approach for extensibly profiling the molecular states of cellular subpopulations , 2009, Nature Methods.

[6]  D. Swinney,et al.  How were new medicines discovered? , 2011, Nature Reviews Drug Discovery.

[7]  Karol Kozak,et al.  Kernelized Z’ factor in multiparametric screening technology , 2010, RNA biology.

[8]  Wenqing Cai,et al.  Small molecule regulators of autophagy identified by an image-based high-throughput screen , 2007, Proceedings of the National Academy of Sciences.

[9]  Christopher I. Bayly,et al.  Evaluating Virtual Screening Methods: Good and Bad Metrics for the "Early Recognition" Problem , 2007, J. Chem. Inf. Model..

[10]  Gisbert Schneider,et al.  Phenotype-based high-content chemical library screening identifies statins as inhibitors of in vivo lymphangiogenesis , 2012, Proceedings of the National Academy of Sciences.

[11]  Guixia Liu,et al.  Performance Evaluation of 2D Fingerprint and 3D Shape Similarity Methods in Virtual Screening , 2012, J. Chem. Inf. Model..

[12]  Oliver Dürr,et al.  Robust Hit Identification by Quality Assurance and Multivariate Data Analysis of a High-Content, Cell-Based Assay , 2007, Journal of biomolecular screening.

[13]  Maria L. Rizzo,et al.  Measuring and testing dependence by correlation of distances , 2007, 0803.4101.

[14]  Xiaobo Zhou,et al.  Using iterative cluster merging with improved gap statistics to perform online phenotype discovery in the context of high-throughput RNAi screens , 2008, BMC Bioinformatics.

[15]  Thomas D. Y. Chung,et al.  A Simple Statistical Parameter for Use in Evaluation and Validation of High Throughput Screening Assays , 1999, Journal of biomolecular screening.

[16]  Paul A Clemons,et al.  The Connectivity Map: Using Gene-Expression Signatures to Connect Small Molecules, Genes, and Disease , 2006, Science.

[17]  Lani F. Wu,et al.  Multidimensional Drug Profiling By Automated Microscopy , 2004, Science.

[18]  Stewart T. Cole,et al.  High Content Screening Identifies Decaprenyl-Phosphoribose 2′ Epimerase as a Target for Intracellular Antimycobacterial Inhibitors , 2009, PLoS pathogens.

[19]  Marc Bickle,et al.  The beautiful cell: high-content screening in drug discovery , 2010, Analytical and bioanalytical chemistry.

[20]  Kenneth M Comess,et al.  Development of a High-Content Screening Assay Panel to Accelerate Mechanism of Action Studies for Oncology Research , 2012, Journal of biomolecular screening.

[21]  David Rogers,et al.  Extended-Connectivity Fingerprints , 2010, J. Chem. Inf. Model..

[22]  Jonathan Low,et al.  A Robust High-Content Imaging Approach for Probing the Mechanism of Action and Phenotypic Outcomes of Cell-Cycle Modulators , 2011, Molecular Cancer Therapeutics.

[23]  Cynthia L Adams,et al.  Compound classification using image-based cellular phenotypes. , 2006, Methods in enzymology.

[24]  P. Selzer,et al.  Differentiation and Visualization of Diverse Cellular Phenotypic Responses in Primary High-Content Screening , 2012, Journal of biomolecular screening.

[25]  John A. Tallarico,et al.  Integrating high-content screening and ligand-target prediction to identify mechanism of action. , 2008, Nature chemical biology.

[26]  Wolfgang Link,et al.  Chemical Genetic Analysis of FOXO Nuclear–Cytoplasmic Shuttling by Using Image‐Based Cell Screening , 2008, Chembiochem : a European journal of chemical biology.

[27]  Daniel Rauh,et al.  An Unbiased Cell Morphology–Based Screen for New, Biologically Active Small Molecules , 2005, PLoS biology.

[28]  S. Horvath,et al.  Unsupervised Learning With Random Forest Predictors , 2006 .

[29]  Anne Kümmel,et al.  Integration of Multiple Readouts into the Z' Factor for Assay Quality Assessment , 2010, Journal of biomolecular screening.

[30]  Marvin Johnson,et al.  Concepts and applications of molecular similarity , 1990 .