Shift-invariant similarities circumvent distance concentration in stochastic neighbor embedding and variants

Dimensionality reduction aims at representing high-dimensional data in low-dimensional spaces, mainly for visualization and exploratory purposes. As an alternative to projections on linear subspaces, nonlinear dimensionality reduction, also known as manifold learning, can provide data representations that preserve structural properties such as pairwise distances or local neighborhoods. Very recently, similarity preservation emerged as a new paradigm for dimensionality reduction, with methods such as stochastic neighbor embedding and its variants. Experimentally, these methods significantly outperform the more classical methods based on distance or transformed distance preservation. This paper explains both theoretically and experimentally the reasons for these performances. In particular, it details why the phenonomenon of distance concentration is an impediment towards effcient dimensionality reduction and how SNE and its variants circumvent this diffculty by using similarities that are invariant to shifts with respect to squared distances. The paper also proposes a generalized definition of shift-invariant similarities that extend the applicability of SNE to noisy data.

[1]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[2]  A. Householder,et al.  Discussion of a set of points in terms of their mutual distances , 1938 .

[3]  Stéphane Lafon,et al.  Diffusion maps , 2006 .

[4]  Ian T. Jolliffe,et al.  Principal Component Analysis , 2002, International Encyclopedia of Statistical Science.

[5]  Geoffrey E. Hinton,et al.  Stochastic Neighbor Embedding , 2002, NIPS.

[6]  W. Torgerson Multidimensional scaling: I. Theory and method , 1952 .

[7]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[8]  Jeanny Hérault,et al.  Vector Quantization and Projection Neural Network , 1993, IWANN.

[9]  Stephen P. Boyd,et al.  A duality view of spectral methods for dimensionality reduction , 2006, ICML.

[10]  R. Shepard The analysis of proximities: Multidimensional scaling with an unknown distance function. II , 1962 .

[11]  Forrest W. Young,et al.  Nonmetric individual differences multidimensional scaling: An alternating least squares method with optimal scaling features , 1977 .

[12]  Richard Bellman,et al.  Adaptive Control Processes: A Guided Tour , 1961, The Mathematical Gazette.

[13]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[14]  Michel Verleysen,et al.  Nonlinear Dimensionality Reduction , 2021, Computer Vision.

[15]  David L. Donoho,et al.  Aide-Memoire . High-Dimensional Data Analysis : The Curses and Blessings of Dimensionality , 2000 .

[16]  Pascal Vincent,et al.  The Difficulty of Training Deep Architectures and the Effect of Unsupervised Pre-Training , 2009, AISTATS.

[17]  Michael C. Hout,et al.  Multidimensional Scaling , 2003, Encyclopedic Dictionary of Archaeology.

[18]  Kilian Q. Weinberger,et al.  Unsupervised Learning of Image Manifolds by Semidefinite Programming , 2004, CVPR.

[19]  Michel Verleysen,et al.  The Concentration of Fractional Distances , 2007, IEEE Transactions on Knowledge and Data Engineering.

[20]  Zenglin Xu,et al.  Heavy-Tailed Symmetric Stochastic Neighbor Embedding , 2009, NIPS.

[21]  Jeanny Hérault,et al.  Curvilinear component analysis: a self-organizing neural network for nonlinear mapping of data sets , 1997, IEEE Trans. Neural Networks.

[22]  John W. Sammon,et al.  A Nonlinear Mapping for Data Structure Analysis , 1969, IEEE Transactions on Computers.

[23]  Michel Verleysen,et al.  On the Role and Impact of the Metaparameters in t-distributed Stochastic Neighbor Embedding , 2010, COMPSTAT.

[24]  T. Kohonen Self-organized formation of topographically correct feature maps , 1982 .

[25]  Michel Verleysen,et al.  Quality assessment of dimensionality reduction: Rank-based criteria , 2009, Neurocomputing.

[26]  Jarkko Venna,et al.  Information Retrieval Perspective to Nonlinear Dimensionality Reduction for Data Visualization , 2010, J. Mach. Learn. Res..

[27]  M. Kramer Nonlinear principal component analysis using autoassociative neural networks , 1991 .

[28]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[29]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[30]  Kun Huang,et al.  A unifying theorem for spectral embedding and clustering , 2003, AISTATS.

[31]  Michel Verleysen,et al.  Simbed: Similarity-Based Embedding , 2009, ICANN.

[32]  J. Kruskal Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis , 1964 .

[33]  Teuvo Kohonen,et al.  Self-organized formation of topologically correct feature maps , 2004, Biological Cybernetics.

[34]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.