PeacoQC: Peak‐based selection of high quality cytometry data

In cytometry analysis, a large number of markers is measured for thousands or millions of cells, resulting in high‐dimensional datasets. During the measurement of these samples, erroneous events can occur such as clogs, speed changes, slow uptake of the sample etc., which can influence the downstream analysis and can even lead to false discoveries. As these issues can be difficult to detect manually, an automated approach is recommended. In order to filter these erroneous events out, we created a novel quality control algorithm, Peak Extraction And Cleaning Oriented Quality Control (PeacoQC), that allows for automated cleaning of cytometry data. The algorithm will determine density peaks per channel on which it will remove low quality events based on their position in the isolation tree and on their mean absolute deviation distance to these density peaks. To evaluate PeacoQC's cleaning capability, it was compared to three other existing quality control algorithms (flowAI, flowClean and flowCut) on a wide variety of datasets. In comparison to the other algorithms, PeacoQC was able to filter out all different types of anomalies in flow, mass and spectral cytometry data, while the other methods struggled with at least one type. In the quantitative comparison, PeacoQC obtained the highest median balanced accuracy and a similar running time compared to the other algorithms while having a better scalability for large files. To ensure that the parameters chosen in the PeacoQC algorithm are robust, the cleaning tool was run on 16 public datasets. After inspection, only one sample was found where the parameters should be further optimized. The other 15 datasets were analyzed correctly indicating a robust parameter choice. Overall, we present a fast and accurate quality control algorithm that outperforms existing tools and ensures high‐quality data that can be used for further downstream analysis. An R implementation is available.

[1]  M. Jaimes,et al.  OMIP‐069: Forty‐Color Full Spectrum Flow Cytometry Panel for Deep Immunophenotyping of Major Cell Subsets in Human Peripheral Blood , 2020, Cytometry. Part A : the journal of the International Society for Analytical Cytology.

[2]  Sherrie Wang,et al.  flowCut — An R package for precise and accurate automated removal of outlier events and flagging of files based on time versus fluorescence analysis , 2020, bioRxiv.

[3]  Elena Blanco,et al.  The EuroFlow PID Orientation Tube for Flow Cytometric Diagnostic Screening of Primary Immunodeficiencies of the Lymphoid System , 2019, Front. Immunol..

[4]  Sean C. Bendall,et al.  Comprehensive Immune Monitoring of Clinical Trials to Advance Human Immunotherapy , 2018, bioRxiv.

[5]  Yves Dauvilliers,et al.  High-dimensional single-cell analysis reveals the immune signature of narcolepsy , 2016, The Journal of experimental medicine.

[6]  Hao Chen,et al.  flowAI: automatic and interactive anomaly discerning tools for flow cytometry data , 2016, Bioinform..

[7]  Mario Roederer,et al.  flowClean: Automated identification and removal of fluorescence anomalies in flow cytometry data , 2016, Cytometry. Part A : the journal of the International Society for Analytical Cytology.

[8]  J. P. McCoy,et al.  Standardizing Flow Cytometry Immunophenotyping Analysis from the Human ImmunoPhenotyping Consortium , 2016, Scientific Reports.

[9]  G. Nolan,et al.  A benchmark for evaluation of algorithms for identification of cellular correlates of clinical outcomes , 2016, Cytometry. Part A : the journal of the International Society for Analytical Cytology.

[10]  Piet Demeester,et al.  FlowSOM: Using self‐organizing maps for visualization and interpretation of cytometry data , 2015, Cytometry. Part A : the journal of the International Society for Analytical Cytology.

[11]  T. Kalina,et al.  EuroFlow standardization of flow cytometer instrument settings and immunophenotyping protocols , 2012, Leukemia.

[12]  Robert Gentleman,et al.  flowCore: a Bioconductor package for high throughput flow cytometry , 2009, BMC Bioinformatics.

[13]  Zhi-Hua Zhou,et al.  Isolation Forest , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[14]  Maura Gasparetto,et al.  Data quality assessment of ungated flow cytometry data in high throughput experiments , 2007, Cytometry. Part A : the journal of the International Society for Analytical Cytology.

[15]  Ryan R Brinkman,et al.  Data-Driven Flow Cytometry Analysis. , 2019, Methods in molecular biology.

[16]  T. Lakshmikanth,et al.  Automated Cell Processing for Mass Cytometry Experiments. , 2019, Methods in molecular biology.

[17]  Anis Larbi,et al.  Flow Cytometry in Multi-center and Longitudinal Studies , 2017 .