Accelerating t-SNE using tree-based algorithms

The paper investigates the acceleration of t-SNE--an embedding technique that is commonly used for the visualization of high-dimensional data in scatter plots--using two tree-based algorithms. In particular, the paper develops variants of the Barnes-Hut algorithm and of the dual-tree algorithm that approximate the gradient used for learning t-SNE embeddings in O(N log N). Our experiments show that the resulting algorithms substantially accelerate t-SNE, and that they make it possible to learn embeddings of data sets with millions of objects. Somewhat counterintuitively, the Barnes-Hut variant of t-SNE appears to outperform the dual-tree variant.

[1]  Yifan Hu,et al.  Efficient, High-Quality Force-Directed Graph Drawing , 2006 .

[2]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[3]  Ludo Waltman,et al.  Software survey: VOSviewer, a computer program for bibliometric mapping , 2009, Scientometrics.

[4]  Y. LeCun,et al.  Learning methods for generic object recognition with invariance to pose and lighting , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[5]  Richard I. Hartley,et al.  Optimised KD-trees for fast image descriptor matching , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  David G. Lowe,et al.  Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration , 2009, VISAPP.

[7]  M. S. Warren,et al.  A parallel hashed Oct-Tree N-body algorithm , 1993, Supercomputing '93.

[8]  Miguel Á. Carreira-Perpiñán,et al.  Entropic Affinities: Properties and Efficient Numerical Computation , 2013, ICML.

[9]  Piet Hut,et al.  A hierarchical O(N log N) force-calculation algorithm , 1986, Nature.

[10]  Peter Tiño,et al.  Hierarchical GTM: Constructing Localized Nonlinear Projection Manifolds in a Principled Way , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Laurens van der Maaten,et al.  Barnes-Hut-SNE , 2013, ICLR.

[12]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[13]  Alexander G. Gray Fast kernel matrix-vector multiplication with application to Gaussian process learning , 2004 .

[14]  John Langford,et al.  Cover trees for nearest neighbor , 2006, ICML.

[15]  Leslie Greengard,et al.  A fast algorithm for particle simulations , 1987 .

[16]  V. Rokhlin Rapid solution of integral equations of classical potential theory , 1985 .

[17]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[18]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[19]  Samuel Kaski,et al.  Scalable Optimization of Neighbor Embedding for Visualization , 2013, ICML.

[20]  Miguel Á. Carreira-Perpiñán,et al.  The Elastic Embedding Algorithm for Dimensionality Reduction , 2010, ICML.

[21]  Daniel A. Keim,et al.  Mastering the Information Age - Solving Problems with Visual Analytics , 2010 .

[22]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[23]  Neil D. Lawrence,et al.  Spectral Dimensionality Reduction via Maximum Entropy , 2011, AISTATS.

[24]  Ramani Duraiswami,et al.  Fast optimal bandwidth selection for kernel density estimation , 2006, SDM.

[25]  Christopher J. C. Burges,et al.  Dimension Reduction: A Guided Tour , 2010, Found. Trends Mach. Learn..

[26]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[27]  Edward M. Reingold,et al.  Graph drawing by force‐directed placement , 1991, Softw. Pract. Exp..

[28]  Nando de Freitas,et al.  Fast Krylov Methods for N-Body Learning , 2005, NIPS.

[29]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[30]  Rudolf Bayer,et al.  Organization and maintenance of large ordered indexes , 1972, Acta Informatica.

[31]  Miguel Á. Carreira-Perpiñán,et al.  Linear-time training of nonlinear low-dimensional embeddings , 2014, AISTATS.

[32]  Jeffrey Heer,et al.  A tour through the visualization zoo , 2010, ACM Queue.

[33]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[34]  Paul Wilmes,et al.  Alignment-free Visualization of Metagenomic Data by Nonlinear Dimension Reduction , 2014, Scientific Reports.

[35]  Keinosuke Fukunaga,et al.  A Branch and Bound Algorithm for Computing k-Nearest Neighbors , 1975, IEEE Transactions on Computers.

[36]  Peter Eades,et al.  FADE: Graph Drawing, Clustering, and Visual Abstraction , 2000, GD.

[37]  Kilian Q. Weinberger,et al.  Spectral Methods for Dimensionality Reduction , 2006, Semi-Supervised Learning.

[38]  Matthew Chalmers,et al.  A linear iteration time layout algorithm for visualising high-dimensional data , 1996, Proceedings of Seventh Annual IEEE Visualization '96.

[39]  Shuiwang Ji Computational genetic neuroanatomy of the developing mouse brain: dimensionality reduction, visualization, and clustering , 2013, BMC Bioinformatics.

[40]  Sergey Brin,et al.  Near Neighbor Search in Large Metric Spaces , 1995, VLDB.

[41]  Geoffrey E. Hinton,et al.  Stochastic Neighbor Embedding , 2002, NIPS.

[42]  Antonio Torralba,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 80 Million Tiny Images: a Large Dataset for Non-parametric Object and Scene Recognition , 2022 .

[43]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[44]  Jon Louis Bentley,et al.  An Algorithm for Finding Best Matches in Logarithmic Expected Time , 1977, TOMS.

[45]  V. Springel,et al.  GADGET: a code for collisionless and gasdynamical cosmological simulations , 2000, astro-ph/0003162.

[46]  Larry S. Davis,et al.  Improved fast gauss transform and efficient kernel density estimation , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[47]  Geoffrey E. Hinton,et al.  Semantic hashing , 2009, Int. J. Approx. Reason..

[48]  Andrew W. Moore,et al.  An Investigation of Practical Approximate Nearest Neighbor Algorithms , 2004, NIPS.

[49]  Antonio Torralba,et al.  Spectral Hashing , 2008, NIPS.

[50]  Lawrence K. Saul,et al.  Large Margin Gaussian Mixture Modeling for Phonetic Classification and Recognition , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[51]  George Roussos,et al.  A New Error Estimate of the Fast Gauss Transform , 2002, SIAM J. Sci. Comput..

[52]  Eric O. Postma,et al.  Dimensionality Reduction: A Comparative Review , 2008 .

[53]  George E. Karniadakis,et al.  A sharp error estimate for the fast Gauss transform , 2006, J. Comput. Phys..

[54]  G. Kauffmann,et al.  The many lives of active galactic nuclei: cooling flows, black holes and the luminosities and colour , 2005, astro-ph/0508046.

[55]  Robert A. Jacobs,et al.  Increased rates of convergence through learning rate adaptation , 1987, Neural Networks.

[56]  Nando de Freitas,et al.  Empirical Testing of Fast Kernel Density Estimation Algorithms , 2005 .

[57]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[58]  Peter N. Yianilos,et al.  Data structures and algorithms for nearest neighbor search in general metric spaces , 1993, SODA '93.

[59]  Michael S. Warren,et al.  Skeletons from the treecode closet , 1994 .

[60]  G. Zoutendijk,et al.  Methods of Feasible Directions , 1962, The Mathematical Gazette.

[61]  Nando de Freitas,et al.  Fast Computational Methods for Visually Guided Robots , 2005, Proceedings of the 2005 IEEE International Conference on Robotics and Automation.

[62]  Laurens van der Maaten,et al.  Learning a Parametric Embedding by Preserving Local Structure , 2009, AISTATS.

[63]  Andrew W. Moore,et al.  Rapid Evaluation of Multiple Density Models , 2003, AISTATS.

[64]  Yann LeCun,et al.  Convolutional neural networks applied to house numbers digit classification , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[65]  Andrew W. Moore,et al.  'N-Body' Problems in Statistical Learning , 2000, NIPS.

[66]  Jarkko Venna,et al.  Information Retrieval Perspective to Nonlinear Dimensionality Reduction for Data Visualization , 2010, J. Mach. Learn. Res..