Efficient optimization for data visualization as an information retrieval task

Visualization of multivariate data sets is often done by mapping data onto a low-dimensional display with nonlinear dimensionality reduction (NLDR) methods. Many NLDR methods are designed for tasks like manifold learning rather than low-dimensional visualization, and can perform poorly in visualization. We have introduced a formalism where NLDR for visualization is treated as an information retrieval task, and a novel NLDR method called the Neighbor Retrieval Visualizer (NeRV) which outperforms previous methods. The remaining concern is that NeRV has quadratic computational complexity with respect to the number of data. We introduce an efficient learning algorithm for NeRV where relationships between data are approximated through mixture modeling, yielding efficient computation with near-linear computational complexity with respect to the number of data. The method inherits the information retrieval interpretation from the original NeRV, it is much faster to optimize as the number of data grows, and it maintains good visualization performance.

[1]  Erkki Oja,et al.  Multiplicative updates for t-SNE , 2010, 2010 IEEE International Workshop on Machine Learning for Signal Processing.

[2]  Jarkko Venna,et al.  Nonlinear Dimensionality Reduction as Information Retrieval , 2007, AISTATS.

[3]  Noam Nisan,et al.  Neighborhood preserving hashing and approximate queries , 1994, SODA '94.

[4]  Jarkko Venna,et al.  Local multidimensional scaling , 2006, Neural Networks.

[5]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[6]  Nando de Freitas,et al.  Fast Krylov Methods for N-Body Learning , 2005, NIPS.

[7]  Jarkko Venna,et al.  Information Retrieval Perspective to Nonlinear Dimensionality Reduction for Data Visualization , 2010, J. Mach. Learn. Res..

[8]  D. Donoho,et al.  Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Kilian Q. Weinberger,et al.  Nonlinear Dimensionality Reduction by Semidefinite Programming and Kernel Matrix Factorization , 2005, AISTATS.

[10]  Seungjin Choi,et al.  Fast stochastic neighbor embedding: a trust-region algorithm , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[11]  Geoffrey E. Hinton,et al.  Stochastic Neighbor Embedding , 2002, NIPS.

[12]  Jarkko Venna,et al.  Comparison of Visualization Methods for an Atlas of Gene Expression Data Sets , 2007, Inf. Vis..

[13]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[14]  Michel Verleysen,et al.  Nonlinear projection with curvilinear distances: Isomap versus curvilinear distance analysis , 2004, Neurocomputing.

[15]  Miguel Á. Carreira-Perpiñán,et al.  The Elastic Embedding Algorithm for Dimensionality Reduction , 2010, ICML.