Single- and Multi-Distribution Dimensionality Reduction Approaches for a Better Data Structure Capturing

In recent years, the huge expansion of digital technologies has vastly increased the volume of data to be explored, such that reducing the dimensionality of data is an essential step in data exploration. The integrity of a dimensionality reduction technique relates to the goodness of maintaining the data structure. Dimensionality reduction techniques such as Principal Component Analyses (PCA) and Multidimensional Scaling (MDS) globally preserve the distance ranking at the expense of neglecting small-distance preservation. Conversely, the structure capturing of some other methods such as Isomap, Locally Linear Embedding (LLE), Laplacian Eigenmaps ${t}$ -Stochastic Neighbour Embedding (t-SNE), Uniform Manifold Approximation and Projection (UMAP), and TriMap rely on the number of neighbours considered. This paper presents a dimensionality reduction technique, Same Degree Distribution (SDD) that does not rely on the number of neighbours, thanks to using degree-distributions in both high and low dimensional spaces. Degree-distribution is similar to Student-t distribution and is less expensive than Gaussian distribution. As such, it enables better global data preservation in less processing time. Moreover, to improve the data structure capturing, SDD has been extended to Multi-SDD s (MSDD), which employs various degree-distributions on top of SDD. The proposed approach and its extension demonstrated a greater performance compared with eight other benchmark methods, tested in several popular synthetics and real datasets such as Iris, Breast Cancer, Swiss Roll, MNIST, and Make Blob evaluated by the co-ranking matrix and Kendall’s Tau coefficient. For further work, we aim to approximate the number of distributions and their degrees in relation to the given dataset. Reducing the computational complexity is another objective for further work.

[1]  Junyu Dong,et al.  An Overview on Data Representation Learning: From Traditional Feature Learning to Recent Deep Learning , 2016, ArXiv.

[2]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[3]  Manfred K. Warmuth,et al.  A more globally accurate dimensionality reduction method using triplets , 2018, ArXiv.

[4]  Ghalem Belalem,et al.  PCA as Dimensionality Reduction for Large-Scale Image Retrieval Systems , 2017, Int. J. Ambient Comput. Intell..

[5]  J. Vrábel,et al.  Restricted Boltzmann Machine method for dimensionality reduction of large spectroscopic data , 2020 .

[6]  Zhang Yi,et al.  Graph Regularized Restricted Boltzmann Machine , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[7]  Shuiwang Ji Computational genetic neuroanatomy of the developing mouse brain: dimensionality reduction, visualization, and clustering , 2013, BMC Bioinformatics.

[8]  Derek K. Jones,et al.  Dimensionality reduction of diffusion MRI measures for improved tractometry of the human brain , 2019, NeuroImage.

[9]  Yuansheng Zhou,et al.  Using global t-SNE to preserve inter-cluster data structure , 2018, bioRxiv.

[10]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[11]  Michael T. M. Emmerich,et al.  A tutorial on multiobjective optimization: fundamentals and evolutionary methods , 2018, Natural Computing.

[12]  J. Kruskal Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis , 1964 .

[13]  Ngai-Man Cheung,et al.  Dimensionality reduction of brain imaging data using graph signal processing , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[14]  Shantanu Sharma,et al.  A technique for dimension reduction of MFCC spectral features for speech recognition , 2015, 2015 International Conference on Industrial Instrumentation and Control (ICIC).

[15]  Leland McInnes,et al.  UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction , 2018, ArXiv.

[16]  John W. Sammon,et al.  A Nonlinear Mapping for Data Structure Analysis , 1969, IEEE Transactions on Computers.

[17]  Kadan Aljoumaa,et al.  A comparative dimensionality reduction study in telecom customer segmentation using deep learning and PCA , 2020, Journal of Big Data.

[18]  H. Hotelling Analysis of a complex of statistical variables into principal components. , 1933 .

[19]  Geoffrey E. Hinton,et al.  Stochastic Neighbor Embedding , 2002, NIPS.

[20]  Michel Verleysen,et al.  Multi-scale similarities in stochastic neighbour embedding: Reducing dimensionality while preserving both local and global structure , 2015, Neurocomputing.

[21]  Nikil Dutt,et al.  Neural correlates of sparse coding and dimensionality reduction , 2019, PLoS Comput. Biol..

[22]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[23]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[24]  Michel Verleysen,et al.  Quality assessment of dimensionality reduction: Rank-based criteria , 2009, Neurocomputing.

[25]  Geoffrey E. Hinton,et al.  Deep Boltzmann Machines , 2009, AISTATS.

[26]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[27]  Flora S. Tsai Dimensionality reduction techniques for blog visualization , 2011, Expert Syst. Appl..

[28]  Stephen A. Zahorian,et al.  Nonlinear Dimensionality Reduction Methods for Use with Automatic Speech Recognition , 2011 .

[29]  Kaisa Miettinen,et al.  Nonlinear multiobjective optimization , 1998, International series in operations research and management science.

[30]  Alan Julian Izenman,et al.  Modern Multivariate Statistical Techniques: Regression, Classification, and Manifold Learning , 2008 .

[31]  Mikhail Belkin,et al.  Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering , 2001, NIPS.

[32]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[33]  Thar Baker,et al.  Analysis of Dimensionality Reduction Techniques on Big Data , 2020, IEEE Access.

[34]  Hugo Larochelle,et al.  Efficient Learning of Deep Boltzmann Machines , 2010, AISTATS.

[35]  Yoshua Bengio,et al.  Greedy Layer-Wise Training of Deep Networks , 2006, NIPS.

[36]  Alan Julian Izenman,et al.  Modern Multivariate Statistical Techniques , 2008 .