论文信息 - Power-law stochastic neighbor embedding

Power-law stochastic neighbor embedding

Stochastic neighbor embedding (SNE) aims to transform the observations in high-dimensional space into a low-dimensional space which preserves neighbor identities by minimizing the Kullback-Leibler divergence of the pairwise distributions between two spaces where Gaussian distributions are assumed. Data visualization could be improved by adopting the t-SNE where Student t distribution is used in the low-dimensional space. However, data pairs in the latent space are forced to be squeezed due to the loss of dimensions. This study incorporates the power-law distribution into construction of the p-SNE. Such an unsupervised p-SNE increases the physical forces in neighbor embedding so that the neighbors in the low-dimensional space can be adjusted flexibly to reflect the neighboring in the high-dimensional space. The experiments on three learning tasks illustrate that the manifold or data structure using the proposed p-SNE is preserved in better shape than that using SNE and t-SNE.

[1] J. W. Humberston. Classical mechanics , 1980, Nature.

[2] Thomas Villmann,et al. Stochastic neighbor embedding (SNE) for dimension reduction and visualization using arbitrary divergences , 2012, Neurocomputing.

[3] Imre Csiszár,et al. Information Theory and Statistics: A Tutorial , 2004, Found. Trends Commun. Inf. Theory.

[4] J. Steindl. The Pareto Distribution , 1990 .

[5] Jen-Tzung Chien,et al. Deep discriminative manifold learning , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6] Zenglin Xu,et al. Heavy-Tailed Symmetric Stochastic Neighbor Embedding , 2009, NIPS.

[7] Sameer A. Nene,et al. Columbia Object Image Library (COIL100) , 1996 .

[8] Geoffrey E. Hinton,et al. Stochastic Neighbor Embedding , 2002, NIPS.

[9] Okan K. Ersoy,et al. Spherical Stochastic Neighbor Embedding of Hyperspectral Data , 2013, IEEE Transactions on Geoscience and Remote Sensing.

[10] Pascal Vincent,et al. Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11] Shogo Kato,et al. A distribution for a pair of unit vectors generated by Brownian motion , 2009, 0909.1221.

[12] S T Roweis,et al. Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[13] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[14] Geoffrey E. Hinton,et al. Visualizing Similarity Data with a Mixture of Maps , 2007, AISTATS.

[15] Geoffrey E. Hinton,et al. Visualizing Data using t-SNE , 2008 .

[16] Donald W. Bouldin,et al. A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17] Zhaolei Zhang,et al. Deep Supervised t-Distributed Embedding , 2010, ICML.