AUTOMATED GATING OF PORTABLE CYTOMETER DATA BASED ON SKEW t MIXTURE MODELS

A major component of flow cytometry (FCM) data analysis involves gating, which is the process of identifying homogeneous groups of cells. With the rapid development of the portable flow cytometer, manual gating techniques have been unable to meet the demand for accurate and rapid analysis of samples. To provide a practical application for portable devices, we propose a flexible, statistical model-based clustering approach for identifying cell populations in FCM data. This approach, which mimics the manual gating process, employs a finite mixture model with a density function of skew t distribution and estimates parameters via an expectation maximization algorithm. Data analysis from an experiment on a patient’s peripheral blood samples have proven that the proposed methodology yields better results in terms of robustness against outliers than current state-of-the-art automated gating methods, has more flexibility in clustering symmetric data and leads to lower misclassification rates (misclassification rates of skew t method is 0.06442) when handling highly asymmetric data. The method we proposed will improve data analysis of portable flow cytometers, especially when the users have no professional training.

[1]  H. Tamura,et al.  Flow cytometric parameters with little interexaminer variability for diagnosing low-grade myelodysplastic syndromes. , 2008, Leukemia research.

[2]  Charles P. Lin,et al.  Portable two-color in vivo flow cytometer for real-time detection of fluorescently-labeled circulating cells. , 2007, Journal of biomedical optics.

[3]  Haixian Wang,et al.  On EM Estimation for Mixture of Multivariate t-Distributions , 2009, Neural Processing Letters.

[4]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[5]  J. Gratama,et al.  Reduction of variation in T-cell subset enumeration among 55 laboratories using single-platform, three or four-color flow cytometry based on CD45 and SSC-based gating of lymphocytes. , 2002, Cytometry.

[6]  D. V. van Bockstaele,et al.  National External Quality Assessment Scheme for Lymphocyte Immunophenotyping in Belgium , 2003, Clinical chemistry and laboratory medicine.

[7]  A. Azzalini A class of distributions which includes the normal ones , 1985 .

[8]  A. Azzalini,et al.  Distributions generated by perturbation of symmetry with emphasis on a multivariate skew t‐distribution , 2003, 0911.2342.

[9]  Wolfgang Huber,et al.  Statistical methods and software for the analysis of highthroughput reverse genetic assays using flow cytometry readouts , 2006, Genome Biology.

[10]  Tsung-I Lin,et al.  Finite mixture modelling using the skew normal distribution , 2007 .

[11]  Peter J. Rousseeuw,et al.  Fuzzy clustering using scatter matrices , 1996 .

[12]  Hee Chan Kim,et al.  Recent advances in miniaturized microfluidic flow cytometry for clinical use , 2007, Electrophoresis.

[13]  Jack C. Lee,et al.  Robust mixture modeling using the skew t distribution , 2007, Stat. Comput..

[14]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[15]  Arjun K. Gupta Multivariate skew t-distribution , 2003 .

[16]  L Boddy,et al.  Comparison of five clustering algorithms to classify phytoplankton from flow cytometry data. , 2001, Cytometry.

[17]  Geoffrey J. McLachlan,et al.  Finite mixtures of multivariate skew t-distributions: some recent and new results , 2014, Stat. Comput..

[18]  S. Sahu,et al.  A new class of multivariate skew distributions with applications to bayesian regression models , 2003 .

[19]  Irene Vrbik,et al.  Analytic calculations for the EM algorithm for multivariate skew-t mixture models , 2012 .

[20]  Matthew C. Mowlem,et al.  Design, simulation and characterisation of integrated optics for a microfabricated flow cytometer , 2010 .

[21]  John Ferbas,et al.  Mixture modeling approach to flow cytometry data , 2008, Cytometry. Part A : the journal of the International Society for Analytical Cytology.

[22]  Ali Bashashati,et al.  A Survey of Flow Cytometry Data Analysis Methods , 2009, Adv. Bioinformatics.

[23]  Gilles Celeux,et al.  Combining Mixture Components for Clustering , 2010, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[24]  Arvind Gupta,et al.  Data reduction for spectral clustering to analyze high throughput flow cytometry data , 2010, BMC Bioinformatics.

[25]  Ryan R Brinkman,et al.  Rapid cell population identification in flow cytometry data , 2011, Cytometry. Part A : the journal of the International Society for Analytical Cytology.

[26]  Greg Finak,et al.  High‐throughput flow cytometry data normalization for clinical trials , 2014, Cytometry. Part A : the journal of the International Society for Analytical Cytology.

[27]  Adrian E. Raftery,et al.  How Many Clusters? Which Clustering Method? Answers Via Model-Based Cluster Analysis , 1998, Comput. J..

[28]  Geoffrey J. McLachlan,et al.  Robust Cluster Analysis via Mixtures of Multivariate t-Distributions , 1998, SSPR/SPR.

[29]  Gérard Govaert,et al.  Assessing a Mixture Model for Clustering with the Integrated Completed Likelihood , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[30]  S Demers,et al.  Analyzing multivariate flow cytometric data in aquatic sciences. , 1992, Cytometry.

[31]  Adrian E. Raftery,et al.  Model-Based Clustering, Discriminant Analysis, and Density Estimation , 2002 .

[32]  H. Akaike A new look at the statistical model identification , 1974 .

[33]  Raphael Gottardo,et al.  Automated gating of flow cytometry data via robust model‐based clustering , 2008, Cytometry. Part A : the journal of the International Society for Analytical Cytology.