论文信息 - Self-Organizing Nebulous Growths for Robust and Incremental Data Visualization. - 字舞流文

Self-Organizing Nebulous Growths for Robust and Incremental Data Visualization.

Nonparametric dimensionality reduction techniques, such as t-distributed Stochastic Neighbor Embedding (t-SNE) and uniform manifold approximation and projection (UMAP), are proficient in providing visualizations for data sets of fixed sizes. However, they cannot incrementally map and insert new data points into an already provided data visualization. We present self-organizing nebulous growths (SONG), a parametric nonlinear dimensionality reduction technique that supports incremental data visualization, i.e., incremental addition of new data while preserving the structure of the existing visualization. In addition, SONG is capable of handling new data increments, no matter whether they are similar or heterogeneous to the already observed data distribution. We test SONG on a variety of real and simulated data sets. The results show that SONG is superior to Parametric t-SNE, t-SNE, and UMAP in incremental data visualization. Especially, for heterogeneous increments, SONG improves over Parametric t-SNE by 14.98% on the Fashion MNIST data set and 49.73% on the MNIST data set regarding the cluster quality measured by the adjusted mutual information scores. On similar or homogeneous increments, the improvements are 8.36% and 42.26%, respectively. Furthermore, even when the abovementioned data sets are presented all at once, SONG performs better or comparable to UMAP and superior to t-SNE. We also demonstrate that the algorithmic foundations of SONG render it more tolerant to noise compared with UMAP and t-SNE, thus providing greater utility for data with high variance, high mixing of clusters, or noise.

Wei Wang | Saman Halgamuge | Shalin H Naik | Damith A Senanayake

[1] Caleb Weinreb,et al. SPRING: a kinetic interface for visualizing high dimensional single-cell expression data , 2017, bioRxiv.

[2] Saman K. Halgamuge,et al. Investigation of Average Mutual Information for Species Separation Using GSOM , 2009, FGIT.

[3] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[4] Marie Cottrell,et al. Advantages and drawbacks of the Batch Kohonen algorithm , 2002, ESANN.

[5] Manfred K. Warmuth,et al. TriMap: Large-scale Dimensionality Reduction Using Triplets , 2019, ArXiv.

[6] Geoffrey E. Hinton,et al. Stochastic Neighbor Embedding , 2002, NIPS.

[7] Jingzhou Liu,et al. Visualizing Large-scale and High-dimensional Data , 2016, WWW.

[8] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.

[9] Bernd Fritzke,et al. Growing cell structures--A self-organizing network for unsupervised and supervised learning , 1994, Neural Networks.

[10] Lijuan Cao,et al. A comparison of PCA, KPCA and ICA for dimensionality reduction in support vector machine , 2003, Neurocomputing.

[11] Stefan Steinerberger,et al. Fast Interpolation-based t-SNE for Improved Visualization of Single-Cell RNA-Seq Data , 2017, Nature Methods.

[12] James Bailey,et al. Information theoretic measures for clusterings comparison: is a correction for chance necessary? , 2009, ICML '09.

[13] Bernd Fritzke,et al. A Growing Neural Gas Network Learns Topologies , 1994, NIPS.

[14] Anne Condon,et al. Interpretable dimensionality reduction of single cell transcriptome data with deep generative models , 2017, Nature Communications.

[15] N. McGovern,et al. A High-Dimensional Atlas of Human T Cell Diversity Reveals Tissue-Specific Trafficking and Cytokine Signatures. , 2016, Immunity.

[16] Laurens van der Maaten,et al. Learning a Parametric Embedding by Preserving Local Structure , 2009, AISTATS.

[17] P.A. Estevez,et al. Cross-entropy approach to data visualization based on the neural gas network , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[18] Mikhail Belkin,et al. Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[19] Sean C. Bendall,et al. viSNE enables visualization of high dimensional single-cell data and reveals phenotypic heterogeneity of leukemia , 2013, Nature Biotechnology.

[20] Jürgen Schmidhuber,et al. Training Very Deep Networks , 2015, NIPS.

[21] Klaus-Robert Müller,et al. Interpretable deep neural networks for single-trial EEG classification , 2016, Journal of Neuroscience Methods.

[22] Geoffrey E. Hinton,et al. Visualizing Data using t-SNE , 2008 .

[23] Lai Guan Ng,et al. Dimensionality reduction for visualizing single-cell data using UMAP , 2018, Nature Biotechnology.

[24] Teuvo Kohonen,et al. Self-organized formation of topologically correct feature maps , 2004, Biological Cybernetics.

[25] Monica M. C. Schraefel,et al. Trust me, i'm partially right: incremental visualization lets analysts explore large datasets faster , 2012, CHI.

[26] Mario Roederer,et al. A new “Logicle” display method avoids deceptive effects of logarithmic scaling for low signals and compensated data , 2006, Cytometry. Part A : the journal of the International Society for Analytical Cytology.

[27] J. Tenenbaum,et al. A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[28] Kim-Kwang Raymond Choo,et al. Improved t-SNE based manifold dimensional reduction for remote sensing data processing , 2018, Multimedia Tools and Applications.

[29] Satoru Kawai,et al. An Algorithm for Drawing General Undirected Graphs , 1989, Inf. Process. Lett..

[30] Bala Srinivasan,et al. Dynamic self-organizing maps with controlled growth for knowledge discovery , 2000, IEEE Trans. Neural Networks Learn. Syst..

[31] Leland McInnes,et al. UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction , 2018, ArXiv.

[32] Roland Vollgraf,et al. Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[33] Laurens van der Maaten,et al. Accelerating t-SNE using tree-based algorithms , 2014, J. Mach. Learn. Res..

[34] Lorena Montoya,et al. Geo-data acquisition through mobile GIS and digital video: an urban disaster management perspective , 2003, Environ. Model. Softw..

[35] Barbara Hammer,et al. Parametric nonlinear dimensionality reduction using kernel t-SNE , 2015, Neurocomputing.

[36] Terence P Speed,et al. RLE plots: Visualizing unwanted variation in high dimensional data , 2017, PloS one.

[37] S T Roweis,et al. Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.