An unsupervised data projection that preserves the cluster structure

In this paper we propose a new unsupervised dimensionality reduction algorithm that looks for a projection that optimally preserves the clustering data structure of the original space. Formally we attempt to find a projection that maximizes the mutual information between data points and clusters in the projected space. In order to compute the mutual information, we neither assume the data are given in terms of distributions nor impose any parametric model on the within-cluster distribution. Instead, we utilize a non-parametric estimation of the average cluster entropies and search for a linear projection and a clustering that maximizes the estimated mutual information between the projected data points and the clusters. The improved performance is demonstrated on both synthetic and real world examples.

[1]  Jacob Goldberger,et al.  Mutual information based dimensionality reduction with application to non-linear regression , 2010, 2010 IEEE International Workshop on Machine Learning for Signal Processing.

[2]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[3]  M. N. Goria,et al.  A new class of random vector entropy estimators and its applications in testing statistical hypotheses , 2005 .

[4]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[5]  Qing Wang,et al.  Divergence Estimation for Multidimensional Densities Via $k$-Nearest-Neighbor Distances , 2009, IEEE Transactions on Information Theory.

[6]  Geoffrey E. Hinton,et al.  Neighbourhood Components Analysis , 2004, NIPS.

[7]  Jacob Goldberger,et al.  ICA based on a Smooth Estimation of the Differential Entropy , 2008, NIPS.

[8]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[9]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[10]  Kari Torkkola,et al.  Feature Extraction by Non-Parametric Mutual Information Maximization , 2003, J. Mach. Learn. Res..

[11]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[12]  Jacob Goldberger,et al.  Nonparametric Information Theoretic Clustering Algorithm , 2010, ICML.

[13]  J. Victor Binless strategies for estimation of information from neural data. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[14]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[15]  Samuel Kaski,et al.  Discriminative components of data , 2005, IEEE Transactions on Neural Networks.

[16]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.