A two-stage clustering technique for automatic biaxial gating of flow cytometry data

Measurement of various markers of single cells using flow cytometry has several biological applications. These applications include improving our understanding of behavior of cellular systems, identifying rare cell populations and personalized medication. A common critical issue in the existing methods is approximation of the number of cellular populations which heavily affects the accuracy of results. In this work, we propose a novel technique to estimate the number of dominant subtypes and identify them in flow cytometry datasets. Our experimentation on 42 flow cytometry datasets indicates high performance and accurate clustering (F-measure > 91%) in identifying the main cellular populations.

[1]  D. W. Scott,et al.  Multivariate Density Estimation, Theory, Practice and Visualization , 1992 .

[2]  Anton J. Enright,et al.  An efficient algorithm for large-scale detection of protein families. , 2002, Nucleic acids research.

[3]  Sean C. Bendall,et al.  Extracting a Cellular Hierarchy from High-dimensional Cytometry Data with SPADE , 2011, Nature Biotechnology.

[4]  Robert Gentleman,et al.  flowCore: a Bioconductor package for high throughput flow cytometry , 2009, BMC Bioinformatics.

[5]  J. Cheverud,et al.  A simple correction for multiple comparisons in interval mapping genome scans , 2001, Heredity.

[6]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[7]  S. Sealfon,et al.  flowPeaks: a fast unsupervised clustering for flow cytometry data via K-means and density peak finding , 2012, Bioinform..

[8]  Arvind Gupta,et al.  Data reduction for spectral clustering to analyze high throughput flow cytometry data , 2010, BMC Bioinformatics.

[9]  Ryan R Brinkman,et al.  Rapid cell population identification in flow cytometry data , 2011, Cytometry. Part A : the journal of the International Society for Analytical Cytology.

[10]  M. Yoder,et al.  Flow cytometric analysis of hematopoietic development. , 2005, Methods in molecular medicine.

[11]  Greg Finak,et al.  Merging Mixture Components for Cell Population Identification in Flow Cytometry , 2009, Adv. Bioinformatics.

[12]  Jason Raymond,et al.  Integrating Markov clustering and molecular phylogenetics to reconstruct the cyanobacterial species tree from conserved protein families. , 2008, Molecular biology and evolution.

[13]  R. Brinkman,et al.  High-content flow cytometry and temporal data analysis for defining a cellular signature of graft-versus-host disease. , 2007, Biology of blood and marrow transplantation : journal of the American Society for Blood and Marrow Transplantation.

[14]  Inge Koch,et al.  Feature significance for multivariate kernel density estimation , 2008, Comput. Stat. Data Anal..

[15]  J. Bezdek,et al.  FCM: The fuzzy c-means clustering algorithm , 1984 .

[16]  M P Wand,et al.  Automation in high‐content flow cytometry screening , 2009, Cytometry. Part A : the journal of the International Society for Analytical Cytology.

[17]  D. Massart,et al.  The Mahalanobis distance , 2000 .

[18]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[19]  Srinivasan Parthasarathy,et al.  Markov clustering of protein interaction networks with improved balance and scalability , 2010, BCB '10.