The Cells Out of Sample (COOS) dataset and benchmarks for measuring out-of-sample generalization of image classifiers

Understanding if classifiers generalize to out-of-sample datasets is a central problem in machine learning. Microscopy images provide a standardized way to measure the generalization capacity of image classifiers, as we can image the same classes of objects under increasingly divergent, but controlled factors of variation. We created a public dataset of 132,209 images of mouse cells, COOS-7 (Cells Out Of Sample 7-Class). COOS-7 provides a classification setting where four test datasets have increasing degrees of covariate shift: some images are random subsets of the training data, while others are from experiments reproduced months later and imaged by different instruments. We benchmarked a range of classification models using different representations, including transferred neural network features, end-to-end classification with a supervised deep CNN, and features from a self-supervised CNN. While most classifiers perform well on test datasets similar to the training dataset, all classifiers failed to generalize their performance to datasets with greater covariate shifts. These baselines highlight the challenges of covariate shifts in image data, and establish metrics for improving the generalization capacity of image classifiers.

[1]  Anna Goldenberg,et al.  Feature Robustness in Non-stationary Health Records: Caveats to Deployable Model Performance in Common Clinical Machine Learning Tasks , 2019, MLHC.

[2]  Mei Wang,et al.  Deep Visual Domain Adaptation: A Survey , 2018, Neurocomputing.

[3]  Anne E Carpenter,et al.  Automating Morphological Profiling with Generic Deep Convolutional Networks , 2016, bioRxiv.

[4]  Limsoon Wong,et al.  Why Batch Effects Matter in Omics Data, and How to Avoid Them. , 2017, Trends in biotechnology.

[5]  Wolfgang Link,et al.  Protein localization in disease and therapy , 2011, Journal of Cell Science.

[6]  Todd H. Stokes,et al.  Removing Batch Effects From Histopathological Images for Enhanced Cancer Diagnosis , 2014, IEEE Journal of Biomedical and Health Informatics.

[7]  Jessica A. Turner,et al.  Exploration of scanning effects in multi-site structural MRI studies , 2014, Journal of Neuroscience Methods.

[8]  Hong-Bin Shen,et al.  Bioimage-based protein subcellular location prediction: a comprehensive review , 2018, Frontiers of Computer Science.

[9]  A. Oudenaarden,et al.  Nature, Nurture, or Chance: Stochastic Gene Expression and Its Consequences , 2008, Cell.

[10]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[11]  Lassi Paavolainen,et al.  Data-analysis strategies for image-based cell profiling , 2017, Nature Methods.

[12]  Oren Z. Kraus,et al.  Machine learning and computer vision approaches for phenotypic profiling , 2017, The Journal of cell biology.

[13]  Yolanda T. Chong,et al.  CYCLoPs: A Comprehensive Database Constructed from Automated Analysis of Protein Abundance and Subcellular Localization Patterns in Saccharomyces cerevisiae , 2015, G3: Genes, Genomes, Genetics.

[14]  Yolanda T. Chong,et al.  Automated analysis of high‐content microscopy data with deep learning , 2017, Molecular systems biology.

[15]  Michèle Sebag,et al.  Multi-Domain Adversarial Learning , 2019, ICLR.

[16]  Anne E Carpenter,et al.  Pipeline for illumination correction of images for high-throughput microscopy , 2014, Journal of microscopy.

[17]  Nigam H. Shah,et al.  Implications of non-stationarity on predictive modeling using EHRs , 2015, J. Biomed. Informatics.

[18]  Benjamin Recht,et al.  Do CIFAR-10 Classifiers Generalize to CIFAR-10? , 2018, ArXiv.

[19]  Benjamin Recht,et al.  Do ImageNet Classifiers Generalize to ImageNet? , 2019, ICML.

[20]  Marcus A. Badgeley,et al.  Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: A cross-sectional study , 2018, PLoS medicine.

[21]  Marc Berndl,et al.  Improving Phenotypic Measurements in High-Content Imaging Screens , 2017, bioRxiv.

[22]  Stephan Hoyer,et al.  Correcting nuisance variation using Wasserstein distance , 2017, PeerJ.

[23]  B. Snijder,et al.  Origins of regulated cell-to-cell variability , 2011, Nature Reviews Molecular Cell Biology.

[24]  L Shamir,et al.  Assessing the efficacy of low‐level image content descriptors for computer‐based fluorescence microscopy image analysis , 2011, Journal of microscopy.

[25]  David W Andrews,et al.  A Versatile Cell Death Screening Assay Using Dye-Stained Cells and Multivariate Image Analysis , 2015, Assay and drug development technologies.

[26]  Anne E Carpenter,et al.  A dataset of images and morphological profiles of 30 000 small-molecule treatments using the Cell Painting assay , 2017, GigaScience.

[27]  Alex Lu,et al.  Learning unsupervised feature representations for single cell microscopy images with paired cell inpainting , 2018, bioRxiv.

[28]  Alan M. Moses,et al.  YeastSpotter: accurate and parameter-free web segmentation for microscopy images of yeast cells , 2019, Bioinform..