flowLearn: fast and precise identification and quality checking of cell populations in flow cytometry

Motivation Identification of cell populations in flow cytometry is a critical part of the analysis and lays the groundwork for many applications and research discovery. The current paradigm of manual analysis is time consuming and subjective. A common goal of users is to replace manual analysis with automated methods that replicate their results. Supervised tools provide the best performance in such a use case, however they require fine parameterization to obtain the best results. Hence, there is a strong need for methods that are fast to setup, accurate and interpretable. Results flowLearn is a semi‐supervised approach for the quality‐checked identification of cell populations. Using a very small number of manually gated samples, through density alignments it is able to predict gates on other samples with high accuracy and speed. On two state‐of‐the‐art datasets, our tool achieves Symbol‐measures exceeding 0.99 for 31%, and 0.90 for 80% of all analyzed populations. Furthermore, users can directly interpret and adjust automated gates on new sample files to iteratively improve the initial training. Symbol. No Caption available. Availability and implementation FlowLearn is available as an R package on https://github.com/mlux86/flowLearn. Evaluation data is publicly available online. Details can be found in the Supplementary Material.

[1]  B. Silverman Density estimation for statistics and data analysis , 1986 .

[2]  Yi Yao,et al.  Gating mass cytometry data by deep learning , 2016, bioRxiv.

[3]  G. Nolan,et al.  A benchmark for evaluation of algorithms for identification of cellular correlates of clinical outcomes , 2016, Cytometry. Part A : the journal of the International Society for Analytical Cytology.

[4]  Ryan R Brinkman,et al.  Per‐channel basis normalization methods for flow cytometry data , 2009, Cytometry. Part A : the journal of the International Society for Analytical Cytology.

[5]  Laurens van der Maaten,et al.  Accelerating t-SNE using tree-based algorithms , 2014, J. Mach. Learn. Res..

[6]  J. P. McCoy,et al.  Standardizing Flow Cytometry Immunophenotyping Analysis from the Human ImmunoPhenotyping Consortium , 2016, Scientific Reports.

[7]  Sean C. Bendall,et al.  viSNE enables visualization of high dimensional single-cell data and reveals phenotypic heterogeneity of leukemia , 2013, Nature Biotechnology.

[8]  Eamonn J. Keogh,et al.  Derivative Dynamic Time Warping , 2001, SDM.

[9]  B. Becher,et al.  The end of gating? An introduction to automated analysis of high dimensional cytometry data , 2016, European journal of immunology.

[10]  Y. Saeys,et al.  Computational flow cytometry: helping to make sense of high-dimensional immunology data , 2016, Nature Reviews Immunology.

[11]  K M Søndergaard,et al.  [Understanding statistics?]. , 1995, Ugeskrift for laeger.

[12]  Mark D. Robinson,et al.  Comparison of Clustering Methods for High-Dimensional Single-Cell Flow and Mass Cytometry Data , 2016, bioRxiv.

[13]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[14]  Greg Finak,et al.  Critical assessment of automated flow cytometry data analysis techniques , 2013, Nature Methods.

[15]  N. Aghaeepour,et al.  Thinking outside the gate: single-cell assessments in multiple dimensions. , 2015, Immunity.

[16]  Anne E Carpenter,et al.  An open-source solution for advanced imaging flow cytometry data analysis using machine learning , 2017, Methods.

[17]  Piet Demeester,et al.  FlowSOM: Using self‐organizing maps for visualization and interpretation of cytometry data , 2015, Cytometry. Part A : the journal of the International Society for Analytical Cytology.

[18]  Aaron Klein,et al.  Fast Bayesian Optimization of Machine Learning Hyperparameters on Large Datasets , 2016, AISTATS.

[19]  Steve D. M. Brown,et al.  The International Mouse Phenotyping Consortium: past and future perspectives on mouse phenotyping , 2012, Mammalian Genome.

[20]  E. Abt Understanding statistics 3 , 2010, Evidence-Based Dentistry.

[21]  Howard M. Shapiro,et al.  Practical Flow Cytometry , 1985 .

[22]  Greg Finak,et al.  flowDensity: reproducing manual gating of flow cytometry data by automated density-based cell population identification , 2015, Bioinform..

[23]  Anne E Carpenter,et al.  Reconstructing cell cycle and disease progression using deep learning , 2017, Nature Communications.