A Local Similarity-Preserving Framework for Nonlinear Dimensionality Reduction with Neural Networks

Real-world data usually have high dimensionality and it is important to mitigate the curse of dimensionality. High-dimensional data are usually in a coherent structure and make the data in relatively small true degrees of freedom. There are global and local dimensionality reduction methods to alleviate the problem. Most of existing methods for local dimensionality reduction obtain an embedding with the eigenvalue or singular value decomposition, where the computational complexities are very high for a large amount of data. Here we propose a novel local nonlinear approach named Vec2vec for general purpose dimensionality reduction, which generalizes recent advancements in embedding representation learning of words to dimensionality reduction of matrices. It obtains the nonlinear embedding using a neural network with only one hidden layer to reduce the computational complexity. To train the neural network, we build the neighborhood similarity graph of a matrix and define the context of data points by exploiting the random walk properties. Experiments demenstrate that Vec2vec is more efficient than several state-of-the-art local dimensionality reduction methods in a large number of high-dimensional data. Extensive experiments of data classification and clustering on eight real datasets show that Vec2vec is better than several classical dimensionality reduction methods in the statistical hypothesis test, and it is competitive with recently developed state-ofthe-art UMAP.

[1]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[2]  Chengqi Zhang,et al.  Network Representation Learning: A Survey , 2017, IEEE Transactions on Big Data.

[3]  Zoubin Ghahramani,et al.  Unifying linear dimensionality reduction , 2014, 1406.0873.

[4]  Yan Yang,et al.  Dimension Reduction With Extreme Learning Machine , 2016, IEEE Transactions on Image Processing.

[5]  Michael I. Jordan,et al.  On Nonlinear Dimensionality Reduction, Linear Smoothing and Autoencoding , 2018, 1803.02432.

[6]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[7]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[8]  Zenglin Xu,et al.  Structure Learning with Similarity Preserving , 2019, Neural Networks.

[9]  Leland McInnes,et al.  UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction , 2018, ArXiv.

[10]  Ming Shao,et al.  Spectral Bisection Tree Guided Deep Adaptive Exemplar Autoencoder for Unsupervised Domain Adaptation , 2016, AAAI.

[11]  Haibo He,et al.  Dimensionality Reduction of Hyperspectral Imagery Based on Spatial–Spectral Manifold Learning , 2018, IEEE Transactions on Cybernetics.

[12]  Palash Goyal,et al.  dyngraph2vec: Capturing Network Dynamics using Dynamic Graph Representation Learning , 2018, Knowl. Based Syst..

[13]  Jun Yu,et al.  Local Deep-Feature Alignment for Unsupervised Dimension Reduction , 2018, IEEE Transactions on Image Processing.

[14]  Ling Shao,et al.  A Local Structural Descriptor for Image Matching via Normalized Graph Laplacian Embedding , 2016, IEEE Transactions on Cybernetics.

[15]  Sanja Fidler,et al.  Skip-Thought Vectors , 2015, NIPS.

[16]  Kevin Chen-Chuan Chang,et al.  A Comprehensive Survey of Graph Embedding: Problems, Techniques, and Applications , 2017, IEEE Transactions on Knowledge and Data Engineering.

[17]  Jian Pei,et al.  A Survey on Network Embedding , 2017, IEEE Transactions on Knowledge and Data Engineering.

[18]  Tongfeng Sun,et al.  Review of classical dimensionality reduction and sample selection methods for large-scale data processing , 2019, Neurocomputing.

[19]  Eric O. Postma,et al.  Dimensionality Reduction: A Comparative Review , 2008 .

[20]  Kwan-Liu Ma,et al.  An Incremental Dimensionality Reduction Method for Visualizing Streaming Multidimensional Data , 2019, IEEE Transactions on Visualization and Computer Graphics.

[21]  Mingzhe Wang,et al.  LINE: Large-scale Information Network Embedding , 2015, WWW.

[22]  Leland McInnes,et al.  UMAP: Uniform Manifold Approximation and Projection , 2018, J. Open Source Softw..

[23]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[24]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[25]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.