Multi-set Pre-processing of Multicolor Flow Cytometry Data

Flow Cytometry is an analytical technology to simultaneously measure multiple markers per single cell. Ten thousands to millions of single cells can be measured per sample and each sample may contain a different number of cells. All samples may be bundled together, leading to a ‘multi-set’ structure. Many multivariate methods have been developed for Flow Cytometry data but none of them considers this structure in their quantitative handling of the data. The standard pre-processing used by existing multivariate methods provides models mainly influenced by the samples with more cells, while such a model should provide a balanced view of the biomedical information within all measurements. We propose an alternative ‘multi-set’ preprocessing that corrects for the difference in number of cells measured, balancing the relative importance of each multi-cell sample in the data while using all data collected from these expensive analyses. Moreover, one case example shows how multi-set pre-processing may benefit removal of undesired measurement-to-measurement variability and another where class-based multi-set pre-processing enhances the studied response upon comparison to the control reference samples. Our results show that adjusting data analysis algorithms to consider this multi-set structure may greatly benefit immunological insight and classification performance of Flow Cytometry data.

[1]  R. Bro,et al.  Centering and scaling in component analysis , 2003 .

[2]  Lutgarde M. C. Buydens,et al.  Breaking with trends in pre-processing? , 2013 .

[3]  S. de Jong,et al.  A framework for sequential multiblock component methods , 2003 .

[4]  Marietta Kokla,et al.  Novel data analysis method for multicolour flow cytometry links variability of multiple markers on single cells to a clinical phenotype , 2017, Scientific Reports.

[5]  Geert Postma,et al.  Automated flow cytometric identification of disease-specific cells by the ECLIPSE algorithm , 2018, Scientific Reports.

[6]  Marieke E. Timmerman,et al.  Four simultaneous component models for the analysis of multivariate time series from more than one subject to model intraindividual and interindividual differences , 2003 .

[7]  N. L. Johnson,et al.  Systems of frequency curves generated by methods of translation. , 1949, Biometrika.

[8]  Leo Koenderman,et al.  A subset of neutrophils in human systemic inflammation inhibits T cell responses through Mac-1. , 2012, The Journal of clinical investigation.

[9]  M. Roederer,et al.  Flow cytometry strikes gold , 2015, Science.

[10]  Sean C. Bendall,et al.  Extracting a Cellular Hierarchy from High-dimensional Cytometry Data with SPADE , 2011, Nature Biotechnology.

[11]  Maria Yazdanbakhsh,et al.  A field‐applicable method for flow cytometric analysis of granulocyte activation: Cryopreservation of fixed granulocytes , 2018, Cytometry. Part A : the journal of the International Society for Analytical Cytology.

[12]  S. Teichmann,et al.  Single-Cell RNA-Sequencing Reveals a Continuous Spectrum of Differentiation in Hematopoietic Cells , 2016, Cell reports.

[13]  Y. Saeys,et al.  Computational flow cytometry: helping to make sense of high-dimensional immunology data , 2016, Nature Reviews Immunology.

[14]  S. Wold,et al.  Orthogonal projections to latent structures (O‐PLS) , 2002 .

[15]  Sean C. Bendall,et al.  viSNE enables visualization of high dimensional single-cell data and reveals phenotypic heterogeneity of leukemia , 2013, Nature Biotechnology.

[16]  Ryan R Brinkman,et al.  Per‐channel basis normalization methods for flow cytometry data , 2009, Cytometry. Part A : the journal of the International Society for Analytical Cytology.

[17]  Dominique Hansen,et al.  Circulating classical monocytes are associated with CD11c+ macrophages in human visceral adipose tissue , 2017, Scientific Reports.

[18]  Elisa Nemes,et al.  Differential leukocyte counting and immunophenotyping in cryopreserved ex vivo whole blood , 2015, Cytometry. Part A : the journal of the International Society for Analytical Cytology.

[19]  Greg Finak,et al.  Optimizing transformations for automated, high throughput analysis of flow cytometry data , 2010, BMC Bioinformatics.

[20]  Beata Walczak,et al.  Comprehensive Chemometrics: Set: Chemical and Biochemical Data Analysis , 2009 .

[21]  Piet Demeester,et al.  FlowSOM: Using self‐organizing maps for visualization and interpretation of cytometry data , 2015, Cytometry. Part A : the journal of the International Society for Analytical Cytology.

[22]  J. Macgregor,et al.  Analysis of multiblock and hierarchical PCA and PLS models , 1998 .

[23]  John C. Gower,et al.  Understanding Biplots: Gower/Understanding Biplots , 2011 .

[24]  Quentin Lecrevisse,et al.  Standardized flow cytometry for highly sensitive MRD measurements in B-cell acute lymphoblastic leukemia. , 2017, Blood.

[25]  Leo Koenderman,et al.  Functional heterogeneity and differential priming of circulating neutrophils in human experimental endotoxemia , 2010, Journal of leukocyte biology.

[26]  M Roederer,et al.  Spectral compensation for flow cytometry: visualization artifacts, limitations, and caveats. , 2001, Cytometry.

[27]  Age K. Smilde,et al.  Double-check: validation of diagnostic statistics for PLS-DA models in metabolomics studies , 2011, Metabolomics.

[28]  Romà Tauler,et al.  Multiset Data Analysis: Extended Multivariate Curve Resolution , 2020, Comprehensive Chemometrics.

[29]  M. Roederer,et al.  Data analysis in flow cytometry: The future just started , 2010, Cytometry. Part A : the journal of the International Society for Analytical Cytology.

[30]  R. Tibshirani,et al.  Automated identification of stratifying signatures in cellular subpopulations , 2014, Proceedings of the National Academy of Sciences.