Very Fast Interactive Visualization of Large Sets of High-dimensional Data

The embedding of high-dimensional data into 2D/3D space is the most popular way of data visualization. Despite recent advances in developing of very accurate dimensionality reduction algorithms, such as BH-SNE, Q-SNE and LoCH, their relatively high computational complex- ity still remains the obstacle for interactive visualization of truly large datasets consisting of M <106+ of high-dimensional N<103+ feature vectors. We show that a new clone of the multidimensional scaling (MDS)nr-MDScan be up to two orders of magnitude faster than the modern dimensionality reduction algorithms. We postulate its linear O(M) computational and memory complexities. Simultaneously, our method preserves in 2D/3D target spaces high separability of data, similar to that obtained by the state-of-the-art dimensionality reduction algorithms. We present the effects of nr-MDS application in visualization of data repositories such as 20 Newsgroups (M=1.8 104), MNIST (M=7104) and REUTERS (M=2.67105).

[1]  Witold Dzwinel,et al.  Interactive Data Mining by Using Multidimensional Scaling , 2013, ICCS.

[2]  Eric O. Postma,et al.  Dimensionality Reduction: A Comparative Review , 2008 .

[3]  Mircea Andrecut,et al.  Molecular dynamics multidimensional scaling , 2009 .

[4]  Geoffrey E. Hinton,et al.  Semantic hashing , 2009, Int. J. Approx. Reason..

[5]  Robson Motta,et al.  Graph-based measures to assist user assessment of multimensional projections , 2015, Neurocomputing.

[6]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[7]  Tommy W. S. Chow,et al.  Trace Ratio Optimization-Based Semi-Supervised Nonlinear Dimensionality Reduction for Marginal Manifold Visualization , 2013, IEEE Transactions on Knowledge and Data Engineering.

[8]  Witold Dzwinel,et al.  Method of particles in visual clustering of multi-dimensional and large data sets , 1999, Future Gener. Comput. Syst..

[9]  David A. Yuen,et al.  Visual exploration of data by using multidimensional scaling on multicore CPU, GPU, and MPI cluster , 2014, Concurrency and Computation.

[10]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[11]  Fernando Vieira Paulovich,et al.  LoCH: A neighborhood-based multidimensional projection technique for high-dimensional sparse spaces , 2015, Neurocomputing.

[12]  Marc Olano,et al.  Glimmer: Multilevel MDS on the GPU , 2009, IEEE Transactions on Visualization and Computer Graphics.

[13]  J. Douglas Carroll,et al.  Two-Way Multidimensional Scaling: A Review , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[14]  David A. Yuen,et al.  Visual Exploration of Data with Multithread MIC Computer Architectures , 2015, ICAISC.

[15]  Tamara Munzner,et al.  Dimensionality reduction for documents with nearest neighbor queries , 2015, Neurocomputing.

[16]  Tomoharu Iwata,et al.  Active Learning for Interactive Visualization , 2013, AISTATS.

[17]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[18]  Laurens van der Maaten,et al.  Accelerating t-SNE using tree-based algorithms , 2014, J. Mach. Learn. Res..

[19]  Patrick Mair,et al.  Multidimensional Scaling Using Majorization: SMACOF in R , 2008 .

[20]  Kwan-Liu Ma,et al.  Big-Data Visualization , 2013, IEEE Computer Graphics and Applications.

[21]  Haim Levkowitz,et al.  Projection inspector: Assessment and synthesis of multidimensional projections , 2015, Neurocomputing.