Unsupervised Kernel Dimension Reduction

We apply the framework of kernel dimension reduction, originally designed for supervised problems, to unsupervised dimensionality reduction. In this framework, kernel-based measures of independence are used to derive low-dimensional representations that maximally capture information in covariates in order to predict responses. We extend this idea and develop similarly motivated measures for unsupervised problems where covariates and responses are the same. Our empirical studies show that the resulting compact representation yields meaningful and appealing visualization and clustering of data. Furthermore, when used in conjunction with supervised learners for classification, our methods lead to lower classification errors than state-of-the-art methods, especially when embedding data in spaces of very few dimensions.

[1]  C. Berg,et al.  Harmonic Analysis on Semigroups , 1984 .

[2]  Ker-Chau Li,et al.  Sliced Inverse Regression for Dimension Reduction , 1991 .

[3]  Ker-Chau Li,et al.  On Principal Hessian Directions for Data Visualization and Dimension Reduction: Another Application of Stein's Lemma , 1992 .

[4]  Alan Edelman,et al.  The Geometry of Algorithms with Orthogonality Constraints , 1998, SIAM J. Matrix Anal. Appl..

[5]  Christopher M. Bishop,et al.  GTM: The Generative Topographic Mapping , 1998, Neural Computation.

[6]  Christopher K. I. Williams Computation with Infinite Neural Networks , 1998, Neural Computation.

[7]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[8]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[9]  Naftali Tishby,et al.  The information bottleneck method , 2000, ArXiv.

[10]  N. Cristianini,et al.  On Kernel-Target Alignment , 2001, NIPS.

[11]  R. Cook,et al.  Theory & Methods: Special Invited Paper: Dimension Reduction and Visualization in Discriminant Analysis (with discussion) , 2001 .

[12]  Geoffrey E. Hinton,et al.  Stochastic Neighbor Embedding , 2002, NIPS.

[13]  Michael I. Jordan,et al.  Kernel independent component analysis , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[14]  Michael I. Jordan,et al.  Kernel independent component analysis , 2003 .

[15]  Neil D. Lawrence,et al.  Gaussian Process Latent Variable Models for Visualisation of High Dimensional Data , 2003, NIPS.

[16]  Michael I. Jordan,et al.  Dimensionality Reduction for Supervised Learning with Reproducing Kernel Hilbert Spaces , 2004, J. Mach. Learn. Res..

[17]  Ronald R. Coifman,et al.  Diffusion Maps, Spectral Clustering and Eigenfunctions of Fokker-Planck Operators , 2005, NIPS.

[18]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[19]  Bernhard Schölkopf,et al.  Kernel Methods for Measuring Independence , 2005, J. Mach. Learn. Res..

[20]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[21]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[22]  Michael I. Jordan,et al.  Regression on manifolds using kernel dimension reduction , 2007, ICML '07.

[23]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[24]  Le Song,et al.  A dependence maximization view of clustering , 2007, ICML '07.

[25]  Le Song,et al.  Colored Maximum Variance Unfolding , 2007, NIPS.

[26]  Le Song,et al.  Supervised feature selection via dependence estimation , 2007, ICML '07.

[27]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[28]  Sayan Mukherjee,et al.  Localized Sliced Inverse Regression , 2008, NIPS.

[29]  R. Cook,et al.  Sufficient dimension reduction and prediction in regression , 2009, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[30]  Michael I. Jordan,et al.  Kernel dimension reduction in regression , 2009, 0908.1854.

[31]  Lawrence K. Saul,et al.  Kernel Methods for Deep Learning , 2009, NIPS.

[32]  Michael I. Jordan,et al.  Sufficient dimension reduction for visual sequence classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[33]  W. Marsden I and J , 2012 .

[34]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.