Distance metric learning using random forest for cytometry data

Visualization and clustering of single-cell mass cytometry (CyTOF) data are analytic techniques to identify different cell types. Most of such techniques, such as Euclidean norm, lose their effectiveness when the data dimension increases due to the curse of dimensionality. In this paper, we propose a new cell distance (called CytoRFD) that works based on Random Forest (RF) concept. The experimental results show that the proposed distance can achieve a much higher quality and effectiveness in large data analysis than traditional metrics specially for CyTOF data.

[1]  Sean C. Bendall,et al.  Cytometry by time-of-flight shows combinatorial cytokine expression and virus-specific cell niches within a continuum of CD8+ T cell phenotypes. , 2012, Immunity.

[2]  Charu C. Aggarwal,et al.  On the Surprising Behavior of Distance Metrics in High Dimensional Spaces , 2001, ICDT.

[3]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[4]  M. Snyder,et al.  High-throughput sequencing technologies. , 2015, Molecular cell.

[5]  Ali Bashashati,et al.  A Survey of Flow Cytometry Data Analysis Methods , 2009, Adv. Bioinformatics.

[6]  Mark M. Davis,et al.  Automatic Classification of Cellular Expression by Nonlinear Stochastic Embedding (ACCENSE) , 2013, Proceedings of the National Academy of Sciences.

[7]  Sean C. Bendall,et al.  Extracting a Cellular Hierarchy from High-dimensional Cytometry Data with SPADE , 2011, Nature Biotechnology.

[8]  Eric O. Postma,et al.  Dimensionality Reduction: A Comparative Review , 2008 .

[9]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[10]  Piet Demeester,et al.  FlowSOM: Using self‐organizing maps for visualization and interpretation of cytometry data , 2015, Cytometry. Part A : the journal of the International Society for Analytical Cytology.

[11]  M. Eaman Immune system. , 2000, Nursing standard (Royal College of Nursing (Great Britain) : 1987).

[12]  Fei Wang,et al.  Survey on distance metric learning and dimensionality reduction in data mining , 2014, Data Mining and Knowledge Discovery.

[13]  Nikesh Kotecha,et al.  Web‐Based Analysis and Publication of Flow Cytometry Experiments , 2010, Current protocols in cytometry.

[14]  Fabian J. Theis,et al.  Diffusion maps for high-dimensional single-cell analysis of differentiation data , 2015, Bioinform..

[15]  Bernd Bodenmiller,et al.  Unraveling cell populations in tumors by single-cell mass cytometry. , 2015, Current opinion in biotechnology.

[16]  Sean C. Bendall,et al.  viSNE enables visualization of high dimensional single-cell data and reveals phenotypic heterogeneity of leukemia , 2013, Nature Biotechnology.

[17]  Mehrdad Nourani,et al.  A two-stage clustering technique for automatic biaxial gating of flow cytometry data , 2015, 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).