GPU-Embedding of kNN-Graph Representing Large and High-Dimensional Data

Interactive visual exploration of large and multidimensional data still needs more efficient \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ND \rightarrow 2D$$\end{document} data embedding (DE) algorithms. We claim that the visualization of very high-dimensional data is equivalent to the problem of 2D embedding of undirected kNN-graphs. We demonstrate that high quality embeddings can be produced with minimal time&memory complexity. A very efficient GPU version of IVHD (interactive visualization of high-dimensional data) algorithm is presented, and we compare it to the state-of-the-art GPU-implemented DE methods: BH-SNE-CUDA and AtSNE-CUDA. We show that memory and time requirements for IVHD-CUDA are radically lower than those for the baseline codes. For example, IVHD-CUDA is almost 30 times faster in embedding (without the procedure of kNN graph generation, which is the same for all the methods) of the largest (\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$M=1.4\cdot 10^6$$\end{document}) YAHOO dataset than AtSNE-CUDA. We conclude that in the expense of minor deterioration of embedding quality, compared to the baseline algorithms, IVHD well preserves the main structural properties of ND data in 2D for radically lower computational budget. Thus, our method can be a good candidate for a truly big data (\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$M=10^{8+}$$\end{document}) interactive visualization.

[1]  Manfred K. Warmuth,et al.  TriMap: Large-scale Dimensionality Reduction Using Triplets , 2019, ArXiv.

[2]  Geoffrey E. Hinton,et al.  Stochastic Neighbor Embedding , 2002, NIPS.

[3]  Laurens van der Maaten,et al.  Accelerating t-SNE using tree-based algorithms , 2014, J. Mach. Learn. Res..

[4]  Xiaopeng Zhang,et al.  Capacity Preserving Mapping for High-dimensional Data Visualization , 2019, ArXiv.

[5]  J. Douglas Carroll,et al.  Two-Way Multidimensional Scaling: A Review , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[6]  Witold Dzwinel,et al.  Method of particles in visual clustering of multi-dimensional and large data sets , 1999, Future Gener. Comput. Syst..

[7]  Jingzhou Liu,et al.  Visualizing Large-scale and High-dimensional Data , 2016, WWW.

[8]  David A. Yuen,et al.  Visual exploration of data by using multidimensional scaling on multicore CPU, GPU, and MPI cluster , 2014, Concurrency and Computation.

[9]  John Canny,et al.  GPU accelerated t-distributed stochastic neighbor embedding , 2019, J. Parallel Distributed Comput..

[10]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[11]  Jeff Johnson,et al.  Billion-Scale Similarity Search with GPUs , 2017, IEEE Transactions on Big Data.

[12]  Witold Dzwinel,et al.  ivga: A fast force-directed method for interactive visualization of complex networks , 2017, J. Comput. Sci..

[13]  Nupur Kumari,et al.  ShapeVis: High-dimensional Data Visualization at Scale , 2020, WWW.

[14]  Stan Matwin,et al.  2-D Embedding of Large and High-dimensional Data with Minimal Memory and Computational Time Requirements , 2019, ArXiv.

[15]  Deng Cai,et al.  AtSNE: Efficient and Robust Visualization on GPU through Hierarchical Optimization , 2019, KDD.

[16]  Leland McInnes,et al.  UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction , 2018, ArXiv.

[17]  Sebastian Ruder,et al.  An overview of gradient descent optimization algorithms , 2016, Vestnik komp'iuternykh i informatsionnykh tekhnologii.

[18]  Witold Dzwinel,et al.  ivga: visualization of the network of historical events , 2017, IML.

[19]  Leland McInnes,et al.  UMAP: Uniform Manifold Approximation and Projection , 2018, J. Open Source Softw..

[20]  Elmar Eisemann,et al.  Hierarchical Stochastic Neighbor Embedding , 2016, Comput. Graph. Forum.