flowClean: Automated identification and removal of fluorescence anomalies in flow cytometry data

Modern flow cytometry systems can be coupled to plate readers for high‐throughput acquisition. These systems allow hundreds of samples to be analyzed in a single day. Quality control of the data remains challenging, however, and is further complicated when a large number of parameters is measured in an experiment. Our examination of 29,228 publicly available FCS files from laboratories worldwide indicates 13.7% have a fluorescence anomaly. In particular, fluorescence measurements for a sample over the collection time may not remain stable due to fluctuations in fluid dynamics; the impact of instabilities may differ between samples and among parameters. Therefore, we hypothesized that tracking cell populations (which represent a summary of all parameters) in centered log ratio space would provide a sensitive and consistent method of quality control. Here, we present flowClean, an algorithm to track subset frequency changes within a sample during acquisition, and flag time periods with fluorescence perturbations leading to the emergence of false populations. Aberrant time periods are reported as a new parameter and added to a revised data file, allowing users to easily review and exclude those events from further analysis. We apply this method to proof‐of‐concept datasets and also to a subset of data from a recent vaccine trial. The algorithm flags events that are suspicious by visual inspection, as well as those showing more subtle effects that might not be consistently flagged by investigators reviewing the data manually, and out‐performs the current state‐of‐the‐art. flowClean is available as an R package on Bioconductor, as a module on the free‐to‐use GenePattern web server, and as a plugin for FlowJo X. © 2016 International Society for Advancement of Cytometry

[1]  Jean YH Yang,et al.  Bioconductor: open software development for computational biology and bioinformatics , 2004, Genome Biology.

[2]  John Aitchison,et al.  The Statistical Analysis of Compositional Data , 1986 .

[3]  Jill P. Mesirov,et al.  GenePattern flow cytometry suite , 2013, Source Code for Biology and Medicine.

[4]  Maura Gasparetto,et al.  Data quality assessment of ungated flow cytometry data in high throughput experiments , 2007, Cytometry. Part A : the journal of the International Society for Analytical Cytology.

[5]  S. Shen,et al.  The statistical analysis of compositional data , 1983 .

[6]  J. Mesirov,et al.  GenePattern 2.0 , 2006, Nature Genetics.

[7]  Pratip K. Chattopadhyay,et al.  Early immunologic correlates of HIV protection can be identified from computational analysis of complex multivariate T-cell flow cytometry assays , 2012, Bioinform..

[8]  Josef Spidlen,et al.  FlowRepository: A resource of annotated flow cytometry datasets associated with peer‐reviewed publications , 2012, Cytometry. Part A : the journal of the International Society for Analytical Cytology.

[9]  Jane M. Fry,et al.  Compositional data analysis and zeros in micro data , 2000 .

[10]  D. Price,et al.  Quantum dot semiconductor nanocrystals for immunophenotyping by polychromatic flow cytometry , 2006, Nature Medicine.

[11]  P. Fearnhead,et al.  Optimal detection of changepoints with a linear computational cost , 2011, 1101.1438.

[12]  Raphael Gottardo,et al.  Orchestrating high-throughput genomic analysis with Bioconductor , 2015, Nature Methods.

[13]  Greg Finak,et al.  High‐throughput flow cytometry data normalization for clinical trials , 2014, Cytometry. Part A : the journal of the International Society for Analytical Cytology.