Power-law stochastic neighbor embedding

Stochastic neighbor embedding (SNE) aims to transform the observations in high-dimensional space into a low-dimensional space which preserves neighbor identities by minimizing the Kullback-Leibler divergence of the pairwise distributions between two spaces where Gaussian distributions are assumed. Data visualization could be improved by adopting the t-SNE where Student t distribution is used in the low-dimensional space. However, data pairs in the latent space are forced to be squeezed due to the loss of dimensions. This study incorporates the power-law distribution into construction of the p-SNE. Such an unsupervised p-SNE increases the physical forces in neighbor embedding so that the neighbors in the low-dimensional space can be adjusted flexibly to reflect the neighboring in the high-dimensional space. The experiments on three learning tasks illustrate that the manifold or data structure using the proposed p-SNE is preserved in better shape than that using SNE and t-SNE.

[1]  J. W. Humberston Classical mechanics , 1980, Nature.

[2]  Thomas Villmann,et al.  Stochastic neighbor embedding (SNE) for dimension reduction and visualization using arbitrary divergences , 2012, Neurocomputing.

[3]  Imre Csiszár,et al.  Information Theory and Statistics: A Tutorial , 2004, Found. Trends Commun. Inf. Theory.

[4]  J. Steindl The Pareto Distribution , 1990 .

[5]  Jen-Tzung Chien,et al.  Deep discriminative manifold learning , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6]  Zenglin Xu,et al.  Heavy-Tailed Symmetric Stochastic Neighbor Embedding , 2009, NIPS.

[7]  Sameer A. Nene,et al.  Columbia Object Image Library (COIL100) , 1996 .

[8]  Geoffrey E. Hinton,et al.  Stochastic Neighbor Embedding , 2002, NIPS.

[9]  Okan K. Ersoy,et al.  Spherical Stochastic Neighbor Embedding of Hyperspectral Data , 2013, IEEE Transactions on Geoscience and Remote Sensing.

[10]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Shogo Kato,et al.  A distribution for a pair of unit vectors generated by Brownian motion , 2009, 0909.1221.

[12]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[13]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[14]  Geoffrey E. Hinton,et al.  Visualizing Similarity Data with a Mixture of Maps , 2007, AISTATS.

[15]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[16]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Zhaolei Zhang,et al.  Deep Supervised t-Distributed Embedding , 2010, ICML.