Automatic Transformation and Integration to Improve Visualization and Discovery of Latent Effects in Imaging Data

Abstract Proper data transformation is an essential part of analysis. Choosing appropriate transformations for variables can enhance visualization, improve efficacy of analytical methods, and increase data interpretability. However, determining appropriate transformations of variables from high-content imaging data poses new challenges. Imaging data produce hundreds of covariates from each of thousands of images in a corpus. Each of these covariates will have a different distribution and needs a potentially different transformation. As such imaging data produce hundreds of covariates, determining an appropriate transformation for each of them is infeasible by hand. In this article, we explore simple, robust, and automatic transformations of high-content image data. A central application of our work is to microenvironment microarray bio-imaging data from the NIH LINCS program. We show that our robust transformations enhance visualization and improve the discovery of substantively relevant latent effects. These transformations enhance analysis of image features individually and also improve data integration approaches when combining together multiple features. We anticipate that the advantages of this work will likely also be realized in the analysis of data from other high-content and highly multiplexed technologies like Cell Painting or Cyclic Immunofluorescence. Software and further analysis can be found at gjhunt.github.io/rr. Supplementary materials for this article are available online.

[1]  Mark A. LaBarge Breaking the canon: indirect regulation of Wnt signaling in mammary stem cells by MMP3. , 2013, Cell stem cell.

[2]  D. G. Simpson,et al.  Breakdown robustness of tests , 1990 .

[3]  Ricardo A. Maronna,et al.  Principal Components and Orthogonal Regression Based on Robust Scales , 2005, Technometrics.

[4]  D. Cox,et al.  An Analysis of Transformations , 1964 .

[5]  Anne E Carpenter,et al.  Cell Painting, a high-content image-based assay for morphological profiling using multiplexed fluorescent dyes , 2016, Nature Protocols.

[6]  R. Carroll A Robust Method for Testing Transformations to Achieve Approximate Normality , 1980 .

[7]  B. Parvin,et al.  Molecular deconstruction, detection, and computational prediction of microenvironment-modulated cellular responses to cancer therapeutics. , 2014, Advanced drug delivery reviews.

[8]  R. D. Cook,et al.  Transformations and Influential Cases in Regression , 1983 .

[9]  Mark A. LaBarge,et al.  Fabrication and use of microenvironment microarrays (MEArrays). , 2012, Journal of visualized experiments : JoVE.

[10]  David M. Simcha,et al.  Tackling the widespread and critical impact of batch effects in high-throughput data , 2010, Nature Reviews Genetics.

[11]  Adam A. Margolin,et al.  Quantitative Multiplex Immunohistochemistry Reveals Myeloid-Inflamed Tumor-Immune Complexity Associated with Poor Prognosis. , 2017, Cell reports.

[12]  Francis K. H. Quek,et al.  Attribute bagging: improving accuracy of classifier ensembles by using random feature subsets , 2003, Pattern Recognit..

[13]  D. G. Simpson,et al.  Robust principal component analysis for functional data , 2007 .

[14]  Joe Gray,et al.  Combinatorial Microenvironments Impose a Continuum of Cellular Responses to a Single Pathway-Targeted Anti-cancer Compound. , 2017, Cell reports.

[15]  Sanjay Kumar,et al.  Age-related dysfunction in mechanotransduction impairs differentiation of human mammary epithelial progenitors. , 2014, Cell reports.

[16]  Jianhua Z. Huang,et al.  Functional principal components analysis via penalized rank one approximation , 2008, 0807.4862.

[17]  A. Marazzi,et al.  Robust Box-Cox Transformations for Simple Regression , 2004 .

[18]  M. Hubert,et al.  A fast method for robust principal components with applications to chemometrics , 2002 .

[19]  J. Januschke,et al.  Stem cell decisions: A twist of fate or a niche market? , 2014, Seminars in cell & developmental biology.

[20]  P. Sorger,et al.  Cyclic Immunofluorescence (CycIF), A Highly Multiplexed Method for Single‐cell Imaging , 2016, Current protocols in chemical biology.

[21]  Joe W. Gray,et al.  Microenvironment-Mediated Mechanisms of Resistance to HER2 Inhibitors Differ between HER2+ Breast Cancer Subtypes , 2018, Cell systems.

[22]  Trevor J. Hastie,et al.  Matrix completion and low-rank SVD via fast alternating least squares , 2014, J. Mach. Learn. Res..

[23]  Shelly Maman,et al.  A history of exploring cancer in context , 2018, Nature Reviews Cancer.

[24]  Lassi Paavolainen,et al.  Data-analysis strategies for image-based cell profiling , 2017, Nature Methods.

[25]  Christophe Croux,et al.  High breakdown estimators for principal components: the projection-pursuit approach revisited , 2005 .

[26]  Jianhua Z. Huang,et al.  Integrating Data Transformation in Principal Components Analysis , 2015, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[27]  Mark A. Dane,et al.  Using Microarrays to Interrogate Microenvironmental Impact on Cellular Phenotypes in Cancer. , 2019, Journal of visualized experiments : JoVE.

[28]  M. Bissell,et al.  Of Microenvironments and Mammary Stem Cells , 2007, Stem Cell Reviews.

[29]  Fred A. Wright,et al.  Estimation of Expression Indexes for Oligonucleotide Arrays Using the Singular Value Decomposition , 2006 .

[30]  A. C. Atkinson,et al.  Diagnostic Tests for Transformations , 1986 .