Nonlinear Dimensionality Reduction by Unit Ball Embedding (UBE) and Its Application to Image Clustering

The paper presents an unsupervised nonlinear dimensionality reduction algorithm called Unit Ball Embedding (UBE). Many high-dimensional data, such as object or face images, lie on a union of low-dimensional subspaces which are often called manifolds. The proposed method is able to learn the structure of these manifolds by exploiting the local neighborhood arrangement around each point. It tries to preserve the local structure by minimizing a cost function that measures the discrepancy between similarities of points in the high-dimensional data and similarities of points in the low-dimensional embedding. The cost function is proposed in a way that it provides a hyper-spherical representation of points in the low-dimensional embedding. Visualizations of our method on different datasets show that it creates large gaps between the manifolds and maximizes the separability of them. As a result, it notably improves the quality of unsupervised machine learning tasks (e.g. clustering). UBE is successfully applied on image datasets such as faces, handwritten digits, and objects and the results of clustering on the low-dimensional embedding show significant improvement over existing dimensionality reduction methods.

[1]  Eric O. Postma,et al.  Dimensionality Reduction: A Comparative Review , 2008 .

[2]  W. Marsden I and J , 2012 .

[3]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[4]  张振跃,et al.  Principal Manifolds and Nonlinear Dimensionality Reduction via Tangent Space Alignment , 2004 .

[5]  Geoffrey E. Hinton,et al.  Stochastic Neighbor Embedding , 2002, NIPS.

[6]  Xiaofei He,et al.  Locality Preserving Projections , 2003, NIPS.

[7]  Ian T. Jolliffe,et al.  Principal Component Analysis , 2002, International Encyclopedia of Statistical Science.

[8]  B. Scholkopf,et al.  Fisher discriminant analysis with kernels , 1999, Neural Networks for Signal Processing IX: Proceedings of the 1999 IEEE Signal Processing Society Workshop (Cat. No.98TH8468).

[9]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[10]  Yousef Saad,et al.  Trace optimization and eigenproblems in dimension reduction methods , 2011, Numer. Linear Algebra Appl..

[11]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[12]  Yi-Hao Kao,et al.  Learning a factor model via regularized PCA , 2011, Machine Learning.

[13]  Robson Motta,et al.  Graph-based measures to assist user assessment of multimensional projections , 2015, Neurocomputing.

[14]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[16]  Neil D. Lawrence,et al.  Gaussian Process Latent Variable Models for Visualisation of High Dimensional Data , 2003, NIPS.

[17]  John W. Sammon,et al.  A Nonlinear Mapping for Data Structure Analysis , 1969, IEEE Transactions on Computers.

[18]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[19]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[20]  Trevor F. Cox,et al.  Metric multidimensional scaling , 2000 .

[21]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[22]  Fernando Vieira Paulovich,et al.  LoCH: A neighborhood-based multidimensional projection technique for high-dimensional sparse spaces , 2015, Neurocomputing.

[23]  Chih-Jen Lin,et al.  Projected Gradient Methods for Nonnegative Matrix Factorization , 2007, Neural Computation.