Linear tSNE optimization for the Web

The t-distributed Stochastic Neighbor Embedding (tSNE) algorithm has become in recent years one of the most used and insightful techniques for the exploratory data analysis of high-dimensional data. tSNE reveals clusters of high-dimensional data points at different scales while it requires only minimal tuning of its parameters. Despite these advantages, the computational complexity of the algorithm limits its application to relatively small datasets. To address this problem, several evolutions of tSNE have been developed in recent years, mainly focusing on the scalability of the similarity computations between data points. However, these contributions are insufficient to achieve interactive rates when visualizing the evolution of the tSNE embedding for large datasets. In this work, we present a novel approach to the minimization of the tSNE objective function that heavily relies on modern graphics hardware and has linear computational complexity. Our technique does not only beat the state of the art, but can even be executed on the client side in a browser. We propose to approximate the repulsion forces between data points using adaptive-resolution textures that are drawn at every iteration with WebGL. This approximation allows us to reformulate the tSNE minimization problem as a series of tensor operation that are computed with TensorFlow.js, a JavaScript library for scalable tensor computations.

[1]  David Gotz,et al.  Progressive Visual Analytics: User-Driven Visual Exploration of In-Progress Analytics , 2014, IEEE Transactions on Visualization and Computer Graphics.

[2]  Sanjoy Dasgupta,et al.  Random projection trees and low dimensional manifolds , 2008, STOC.

[3]  Minsuk Kahng,et al.  ActiVis: Visual Exploration of Industry-Scale Deep Neural Network Models , 2017, IEEE Transactions on Visualization and Computer Graphics.

[4]  Leland McInnes,et al.  UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction , 2018, ArXiv.

[5]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[6]  Elmar Eisemann,et al.  Interactive Visual Analysis of Mass Cytometry Data by Hierarchical Stochastic Neighbor Embedding Reveals Rare Cell Types , 2017, bioRxiv.

[7]  Elmar Eisemann,et al.  Hierarchical Stochastic Neighbor Embedding , 2016, Comput. Graph. Forum.

[8]  S. Kullback,et al.  Information Theory and Statistics , 1959 .

[9]  Helwig Hauser,et al.  Interactive visualization of streaming data with Kernel Density Estimation , 2011, 2011 IEEE Pacific Visualization Symposium.

[10]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[11]  M. Rosenblatt Remarks on Some Nonparametric Estimates of a Density Function , 1956 .

[12]  Piet Hut,et al.  A hierarchical O(N log N) force-calculation algorithm , 1986, Nature.

[13]  J. Kiefer,et al.  Stochastic Estimation of the Maximum of a Regression Function , 1952 .

[14]  Jingzhou Liu,et al.  Visualizing Large-scale and High-dimensional Data , 2016, WWW.

[15]  Elmar Eisemann,et al.  Mass cytometry reveals innate lymphoid cell differentiation pathways in the human fetal intestine , 2018, The Journal of experimental medicine.

[16]  Kai Li,et al.  Efficient k-nearest neighbor graph construction for generic similarity measures , 2011, WWW.

[17]  Jian Tang,et al.  PixelSNE: Visualizing Fast with Just Enough Precision via Pixel-Aligned Stochastic Neighbor Embedding , 2016, ArXiv.

[18]  Sverre J. Aarseth,et al.  Gravitational N-Body Simulations , 2003 .

[19]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[20]  Michael Poidinger,et al.  High-dimensional analysis of the murine myeloid cell system , 2014, Nature Immunology.

[21]  Jean-Daniel Fekete,et al.  Progressive Analytics: A Computation Paradigm for Exploratory Data Analysis , 2016, ArXiv.

[22]  J. Tukey The Future of Data Analysis , 1962 .

[23]  Elmar Eisemann,et al.  Approximated and User Steerable tSNE for Progressive Visual Analytics , 2015, IEEE Transactions on Visualization and Computer Graphics.

[24]  Laurens van der Maaten,et al.  Accelerating t-SNE using tree-based algorithms , 2014, J. Mach. Learn. Res..

[25]  Elmar Eisemann,et al.  DeepEyes: Progressive Visual Analytics for Designing Deep Neural Networks , 2018, IEEE Transactions on Visualization and Computer Graphics.

[26]  Elmar Eisemann,et al.  Cytosplore: Interactive Immune Cell Phenotyping for Large Single‐Cell Datasets , 2016, Comput. Graph. Forum.

[27]  Sean C. Bendall,et al.  viSNE enables visualization of high dimensional single-cell data and reveals phenotypic heterogeneity of leukemia , 2013, Nature Biotechnology.

[28]  David G. Lowe,et al.  Scalable Nearest Neighbor Algorithms for High Dimensional Data , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.