Fully unsupervised deep mode of action learning for phenotyping high-content cellular images

The identification and discovery of phenotypes from high content screening (HCS) images is a challenging task. Earlier works use image analysis pipelines to extract biological features, supervised training methods or generate features with neural networks pretrained on non-cellular images. We introduce a novel fully unsupervised deep learning algorithm to cluster cellular images with similar Mode-of-Action together using only the images’ pixel intensity values as input. The method outperforms existing approaches on the labelled subset of the BBBC021 dataset and achieves an accuracy of 97.09% for correctly classifying the Mode-of-Action (MOA) by nearest neighbors matching. One unique aspect of the approach is that it is able to perform training on the entire unannotated dataset, to correctly cluster similar treatments beyond the annotated subset of the dataset and can be used for novel MOA discovery.

[1]  Xian Zhang,et al.  Unsupervised phenotypic analysis of cellular images with multi-scale convolutional neural networks , 2018, bioRxiv.

[2]  Stephan Hoyer,et al.  Correcting nuisance variation using Wasserstein distance , 2017, PeerJ.

[3]  Neil O Carragher,et al.  High-Content Phenotypic Profiling of Drug Response Signatures across Distinct Cancer Cells , 2010, Molecular Cancer Therapeutics.

[4]  James Bailey,et al.  Information Theoretic Measures for Clusterings Comparison: Variants, Properties, Normalization and Correction for Chance , 2010, J. Mach. Learn. Res..

[5]  Anne E Carpenter,et al.  Automating Morphological Profiling with Generic Deep Convolutional Networks , 2016, bioRxiv.

[6]  Kate Saenko,et al.  Correlation Alignment for Unsupervised Domain Adaptation , 2016, Domain Adaptation in Computer Vision Applications.

[7]  Leland McInnes,et al.  hdbscan: Hierarchical density based clustering , 2017, J. Open Source Softw..

[8]  David S. Wishart,et al.  DrugBank 5.0: a major update to the DrugBank database for 2018 , 2017, Nucleic Acids Res..

[9]  Stephan Spiegel,et al.  Metadata-Guided Visual Representation Learning for Biomedical Images , 2019, bioRxiv.

[10]  Marc Berndl,et al.  Improving Phenotypic Measurements in High-Content Imaging Screens , 2017, bioRxiv.

[11]  Anne E Carpenter,et al.  Comparison of Methods for Image-Based Profiling of Cellular Morphological Responses to Small-Molecule Treatment , 2013, Journal of biomolecular screening.

[12]  Leland McInnes,et al.  UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction , 2018, ArXiv.

[13]  Julia Hirschberg,et al.  V-Measure: A Conditional Entropy-Based External Cluster Evaluation Measure , 2007, EMNLP.

[14]  William W. Cohen,et al.  Power Iteration Clustering , 2010, ICML.

[15]  Matthijs Douze,et al.  Deep Clustering for Unsupervised Learning of Visual Features , 2018, ECCV.

[16]  Anne E Carpenter,et al.  CellProfiler: image analysis software for identifying and quantifying cell phenotypes , 2006, Genome Biology.

[17]  Xian Zhang,et al.  A multi‐scale convolutional neural network for phenotyping high‐content cellular images , 2017, Bioinform..

[18]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[19]  Cheng Li,et al.  Adjusting batch effects in microarray expression data using empirical Bayes methods. , 2007, Biostatistics.

[20]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[21]  James Bailey,et al.  Information theoretic measures for clusterings comparison: is a correction for chance necessary? , 2009, ICML '09.