A graph-based N-body approximation with application to stochastic neighbor embedding

We propose a novel approximation technique, bubble approximation (BA), for repulsion forces in an N-body problem, where attraction has a limited range and repulsion acts between all points. These kinds of systems occur frequently in dimension reduction and graph drawing. Like tree codes, the established N-body approximation method, BA replaces several point-to-point computations by one area-to-point computation. Novelty of BA is to consider not only the magnitudes but also the directions of forces from the area. Therefore, its area-to-point approximations are applicable anywhere in the space. The joint effect of forces from inside the area is calculated analytically, assuming a homogeneous mass of points inside the area. These two features free BA from hierarchical data structures and complicated bookkeeping of interactions, which plague tree codes. Instead, BA uses a simple graph to control the computations. The graph provides a sparse matrix, which, suitably weighted, replaces the full matrix of pairwise comparisons in the N-body problem. As a concrete example, we implement a sparse-matrix version of stochastic neighbor embedding (a dimension reduction method), and demonstrate its good performance by comparisons to full-matrix method, and to three different approximate versions of the same method.

[1]  Geoffrey C. Fox,et al.  Dimension reduction and visualization of large high-dimensional data via interpolation , 2010, HPDC '10.

[2]  Yifan Hu,et al.  Efficient, High-Quality Force-Directed Graph Drawing , 2006 .

[3]  I. S. Gradshteyn Table of Integrals, Series and Products, Corrected and Enlarged Edition , 1980 .

[4]  S. Rao Kosaraju,et al.  A decomposition of multidimensional point sets with applications to k-nearest-neighbors and n-body potential fields , 1995, JACM.

[5]  Joshua B. Tenenbaum,et al.  Sparse multidimensional scaling using land-mark points , 2004 .

[6]  Laurens van der Maaten,et al.  Accelerating t-SNE using tree-based algorithms , 2014, J. Mach. Learn. Res..

[7]  Geoffrey E. Hinton,et al.  Stochastic Neighbor Embedding , 2002, NIPS.

[8]  Leslie Greengard,et al.  A fast algorithm for particle simulations , 1987 .

[9]  Zhaolei Zhang,et al.  A Deep Non-linear Feature Mapping for Large-Margin kNN Classification , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[10]  Zenglin Xu,et al.  Heavy-Tailed Symmetric Stochastic Neighbor Embedding , 2009, NIPS.

[11]  Kilian Q. Weinberger,et al.  Nonlinear Dimensionality Reduction by Semidefinite Programming and Kernel Matrix Factorization , 2005, AISTATS.

[12]  David J. Kriegman,et al.  From Few to Many: Illumination Cone Models for Face Recognition under Variable Lighting and Pose , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  Leslie Greengard,et al.  The Fast Gauss Transform , 1991, SIAM J. Sci. Comput..

[14]  Anil K. Jain,et al.  Artificial neural networks for feature extraction and multivariate data projection , 1995, IEEE Trans. Neural Networks.

[15]  Barbara Hammer,et al.  Linear basis-function t-SNE for fast nonlinear dimensionality reduction , 2012, The 2012 International Joint Conference on Neural Networks (IJCNN).

[16]  Dimitris K. Agrafiotis,et al.  Nonlinear Mapping Networks , 2000, J. Chem. Inf. Comput. Sci..

[17]  Andrew W. Appel,et al.  An Efficient Program for Many-Body Simulation , 1983 .

[18]  Zhaolei Zhang,et al.  Deep Supervised t-Distributed Embedding , 2010, ICML.

[19]  Andrew W. Moore,et al.  Nonparametric Density Estimation: Toward Computational Tractability , 2003, SDM.

[20]  Peter Eades,et al.  FADE: Graph Drawing, Clustering, and Visual Abstraction , 2000, GD.

[21]  Matthew Chalmers,et al.  A linear iteration time layout algorithm for visualising high-dimensional data , 1996, Proceedings of Seventh Annual IEEE Visualization '96.

[22]  Andreas Noack,et al.  Energy Models for Graph Clustering , 2007, J. Graph Algorithms Appl..

[23]  Michael E. Tipping,et al.  NeuroScale: Novel Topographic Feature Extraction using RBF Networks , 1996, NIPS.

[24]  Robert P. W. Duin,et al.  Sammon's mapping using neural networks: A comparison , 1997, Pattern Recognit. Lett..

[25]  Miguel Á. Carreira-Perpiñán,et al.  The Elastic Embedding Algorithm for Dimensionality Reduction , 2010, ICML.

[26]  Andrew W. Moore,et al.  The Anchors Hierarchy: Using the Triangle Inequality to Survive High Dimensional Data , 2000, UAI.

[27]  Laurens van der Maaten,et al.  Learning a Parametric Embedding by Preserving Local Structure , 2009, AISTATS.

[28]  Fan Chung Graham,et al.  Drawing Power Law Graphs Using a Local/Global Decomposition , 2007, Algorithmica.

[29]  Yifan Hu,et al.  A Maxent-Stress Model for Graph Layout , 2012, IEEE Transactions on Visualization and Computer Graphics.

[30]  Michael S. Warren,et al.  Skeletons from the treecode closet , 1994 .

[31]  Matthew Chalmers,et al.  A hybrid layout algorithm for sub-quadratic multidimensional scaling , 2002, IEEE Symposium on Information Visualization, 2002. INFOVIS 2002..

[32]  Gautam Biswas,et al.  Evaluation of Projection Algorithms , 1981, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[34]  Kaizhong Zhang,et al.  MetricMap: an embedding technique for processing distance-based queries in metric spaces , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[35]  Guy Melançon,et al.  Multiscale hybrid MDS , 2004, Proceedings. Eighth International Conference on Information Visualisation, 2004. IV 2004..

[36]  Eli Parviainen Effects of sparseness and randomness of pairwise distance matrix on t-SNE results , 2011, ESANN.

[37]  Edward M. Reingold,et al.  Graph drawing by force‐directed placement , 1991, Softw. Pract. Exp..

[38]  Nando de Freitas,et al.  Fast Krylov Methods for N-Body Learning , 2005, NIPS.

[39]  Michael Biehl,et al.  A General Framework for Dimensionality-Reducing Data Visualization Mapping , 2012, Neural Computation.

[40]  Ehsanollah Kabir,et al.  Introducing a very large dataset of handwritten Farsi digits and a study on their varieties , 2007, Pattern Recognit. Lett..

[41]  Sameer A. Nene,et al.  Columbia Object Image Library (COIL100) , 1996 .

[42]  Michel Verleysen,et al.  Simbed: Similarity-Based Embedding , 2009, ICANN.

[43]  Larry S. Davis,et al.  Improved fast gauss transform and efficient kernel density estimation , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[44]  Christos Faloutsos,et al.  FastMap: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets , 1995, SIGMOD '95.

[45]  Andrew W. Moore,et al.  'N-Body' Problems in Statistical Learning , 2000, NIPS.

[46]  Michel Verleysen,et al.  Quality assessment of dimensionality reduction: Rank-based criteria , 2009, Neurocomputing.

[47]  Piet Hut,et al.  A hierarchical O(N log N) force-calculation algorithm , 1986, Nature.

[48]  Matthias W. Seeger,et al.  Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.

[49]  E. Schwartz,et al.  Faster graph-theoretic image processing via small-world and quadtree topologies , 2004, CVPR 2004.

[50]  Geoffrey E. Hinton,et al.  Learning a Nonlinear Embedding by Preserving Class Neighbourhood Structure , 2007, AISTATS.

[51]  Richard C. T. Lee,et al.  A Heuristic Relaxation Method for Nonlinear Mapping in Cluster Analysis , 1973, IEEE Trans. Syst. Man Cybern..

[52]  I. S. Gradshteyn,et al.  Table of Integrals, Series, and Products , 1976 .

[53]  Peter Eades,et al.  A Heuristic for Graph Drawing , 1984 .

[54]  Samuel Kaski,et al.  Scalable Optimization of Neighbor Embedding for Visualization , 2013, ICML.