Kernel-based dimensionality reduction using Renyi's α-entropy measures of similarity

Dimensionality reduction (DR) aims to reveal salient properties of high-dimensional (HD) data in a low-dimensional (LD) representation space. Two elements stipulate success of a DR approach: definition of a notion of pairwise relations in the HD and LD spaces, and measuring the mismatch between these relationships in the HD and LD representations of data. This paper introduces a new DR method, termed Kernel-based entropy dimensionality reduction (KEDR), to measure the embedding quality that is based on stochastic neighborhood preservation, involving a Gram matrix estimation of Renyi's α-entropy. The proposed approach is a data-driven framework for information theoretic learning, based on infinitely divisible matrices. Instead of relying upon regular Renyi's entropies, KEDR also computes the embedding mismatch through a parameterized mixture of divergences, resulting in an improved the preservation of both the local and global data structures. Our approach is validated on both synthetic and real-world datasets and compared to several state-of-the-art algorithms, including the Stochastic Neighbor Embedding-like techniques for which DR approach is a data-driven extension (from the perspective of kernel-based Gram matrices). In terms of visual inspection and quantitative evaluation of neighborhood preservation, the obtained results show that KEDR is competitive and promising DR method. A new dimensionality reduction method is proposed based on a kernel enhancement of stochastic neighborhood preservation.A Gramm matrix estimation of Renyi's alpha-entropy is involved asźthe cost function.The proposed approach is a data-driven framework for information theoretic learning.

[1]  Luis Filipe Coelho Antunes,et al.  Conditional Rényi Entropies , 2012, IEEE Transactions on Information Theory.

[2]  Kilian Q. Weinberger,et al.  Spectral Methods for Dimensionality Reduction , 2006, Semi-Supervised Learning.

[3]  Laurens van der Maaten,et al.  Accelerating t-SNE using tree-based algorithms , 2014, J. Mach. Learn. Res..

[4]  Alireza Sarveniazi An Actual Survey of Dimensionality Reduction , 2014 .

[5]  S. Friedland Convex spectral functions , 1981 .

[6]  Yang Cheng,et al.  Categorical Analysis of Human T Cell Heterogeneity with One-Dimensional Soli-Expression by Nonlinear Stochastic Embedding , 2016, The Journal of Immunology.

[7]  Andy Harter,et al.  Parameterisation of a stochastic model for human face identification , 1994, Proceedings of 1994 IEEE Workshop on Applications of Computer Vision.

[8]  I. Jolliffe Principal Component Analysis , 2002 .

[9]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[10]  Andrés Marino Álvarez-Meza,et al.  Unsupervised Kernel Function Building Using Maximization of Information Potential Variability , 2014, CIARP.

[11]  Michel Verleysen,et al.  Two key properties of dimensionality reduction methods , 2014, 2014 IEEE Symposium on Computational Intelligence and Data Mining (CIDM).

[12]  Jose C. Principe,et al.  Measures of Entropy From Data Using Infinitely Divisible Kernels , 2012, IEEE Transactions on Information Theory.

[13]  Xuelong Li,et al.  Sparse kernel entropy component analysis for dimensionality reduction of biomedical data , 2015, Neurocomputing.

[14]  Ke Jia,et al.  An Improved Hybrid Encoding Cuckoo Search Algorithm for 0-1 Knapsack Problems , 2014, Comput. Intell. Neurosci..

[15]  Michel Verleysen,et al.  Type 1 and 2 mixtures of Kullback-Leibler divergences as cost functions in dimensionality reduction based on similarity preservation , 2013, Neurocomputing.

[16]  Bernhard Schölkopf,et al.  Learning with kernels , 2001 .

[17]  Michel Verleysen,et al.  Nonlinear projection with curvilinear distances: Isomap versus curvilinear distance analysis , 2004, Neurocomputing.

[18]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[19]  F. S. Tsai Comparative Study of Dimensionality Reduction Techniques for Data Visualization , 2010 .

[20]  Reyer Zwiggelaar,et al.  Open Problems in Spectral Dimensionality Reduction , 2014, SpringerBriefs in Computer Science.

[21]  Michel Verleysen,et al.  Quality assessment of dimensionality reduction: Rank-based criteria , 2009, Neurocomputing.

[22]  Jarkko Venna,et al.  Information Retrieval Perspective to Nonlinear Dimensionality Reduction for Data Visualization , 2010, J. Mach. Learn. Res..

[23]  John Aldo Lee,et al.  Unsupervised dimensionality reduction: the challenge of big data visualization , 2015, ESANN.

[24]  Ethem Alpaydin,et al.  Multiple Kernel Learning Algorithms , 2011, J. Mach. Learn. Res..

[25]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[26]  Deborah F. Swayne,et al.  Data Visualization With Multidimensional Scaling , 2008 .

[27]  Austin J. Brockmeier,et al.  A Tensor-Product-Kernel Framework for Multiscale Neural Activity Decoding and Control , 2014, Comput. Intell. Neurosci..

[28]  Thomas Villmann,et al.  Stochastic neighbor embedding (SNE) for dimension reduction and visualization using arbitrary divergences , 2012, Neurocomputing.

[29]  J. Kruskal Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis , 1964 .

[30]  J. Gower Some distance properties of latent root and vector methods used in multivariate analysis , 1966 .

[31]  D. Donoho,et al.  Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[32]  Geoffrey E. Hinton,et al.  Stochastic Neighbor Embedding , 2002, NIPS.

[33]  Jose C. Principe,et al.  Information Theoretic Learning - Renyi's Entropy and Kernel Perspectives , 2010, Information Theoretic Learning.

[34]  Charles R. Johnson,et al.  Topics in Matrix Analysis , 1991 .

[35]  Xinyu Wu,et al.  Dimensionality reduction of data sequences for human activity recognition , 2016, Neurocomputing.

[36]  Alberto D. Pascual-Montano,et al.  A survey of dimensionality reduction techniques , 2014, ArXiv.

[37]  Michel Verleysen,et al.  Shift-invariant similarities circumvent distance concentration in stochastic neighbor embedding and variants , 2011, ICCS.

[38]  Ming Zeng,et al.  LLE for submersible plunger pump fault diagnosis via joint wavelet and SVD approach , 2016, Neurocomputing.

[39]  Michel Verleysen,et al.  Recent methods for dimensionality reduction: A brief comparative analysis , 2014, ESANN.