Dimensionality Reduction for Supervised Learning with Reproducing Kernel Hilbert Spaces

We propose a novel method of dimensionality reduction for supervised learning problems. Given a regression or classification problem in which we wish to predict a response variable Y from an explanatory variable X , we treat the problem of dimensionality reduction as that of finding a low-dimensional “effective subspace” of X which retains the statistical relationship between X and Y . We show that this problem can be formulated in terms of conditional independence. To turn this formulation into an optimization problem we establish a general nonparametric characterization of conditional independence using covariance operators on a reproducing kernel Hilbert space. This characterization allows us to derive a contrast function for estimation of the effective subspace. Unlike many conventional methods for dimensionality reduction in supervised learning, the proposed method requires neither assumptions on the marginal distribution of X , nor a parametric model of the conditional distribution of Y . We present experiments that compare the performance of the method with conventional methods.

[1]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .

[2]  C. Baker Joint measures and cross-covariance operators , 1973 .

[3]  D. Rubinfeld,et al.  Hedonic housing prices and the demand for clean air , 1978 .

[4]  J. Friedman,et al.  Projection Pursuit Regression , 1981 .

[5]  J. Friedman,et al.  Estimating Optimal Transformations for Multiple Regression and Correlation. , 1985 .

[6]  I. Helland ON THE STRUCTURE OF PARTIAL LEAST SQUARES REGRESSION , 1988 .

[7]  A. Höskuldsson PLS regression methods , 1988 .

[8]  Ker-Chau Li,et al.  Sliced Inverse Regression for Dimension Reduction , 1991 .

[9]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[10]  Ker-Chau Li,et al.  On Principal Hessian Directions for Data Visualization and Dimension Reduction: Another Application of Stein's Lemma , 1992 .

[11]  A. Samarov Exploring Regression Structure Using Nonparametric Functional Estimation , 1993 .

[12]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[13]  R. Cook Regression Graphics , 1994 .

[14]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[15]  Alexander J. Smola,et al.  Support Vector Method for Function Approximation, Regression Estimation and Signal Processing , 1996, NIPS.

[16]  Christopher K. I. Williams,et al.  Discovering Hidden Features with Gaussian Processes Regression , 1998, NIPS.

[17]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[18]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[19]  Ker-Chau Li,et al.  Interactive Tree-Structured Regression via Principal Hessian Directions , 2000 .

[20]  J. Polzehl,et al.  Structure adaptive approach for dimension reduction , 2001 .

[21]  R. Cook,et al.  Theory & Methods: Special Invited Paper: Dimension Reduction and Visualization in Discriminant Analysis (with discussion) , 2001 .

[22]  S. Weisberg Dimension Reduction Regression in R , 2002 .

[23]  W. Fung,et al.  DIMENSION REDUCTION BASED ON CANONICAL CORRELATION , 2002 .

[24]  Michael I. Jordan,et al.  Learning Graphical Models with Mercer Kernels , 2002, NIPS.

[25]  Michael I. Jordan,et al.  Tree-dependent Component Analysis , 2002, UAI.

[26]  Danh V. Nguyen,et al.  Tumor classification by partial least squares using microarray gene expression data , 2002, Bioinform..

[27]  D. Alpay The Schur algorithm, reproducing kernel spaces and system theory , 2002 .

[28]  Kari Torkkola,et al.  Feature Extraction by Non-Parametric Mutual Information Maximization , 2003, J. Mach. Learn. Res..

[29]  Michael I. Jordan,et al.  Kernel independent component analysis , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[30]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[31]  Michael I. Jordan,et al.  Beyond Independent Components: Trees and Clusters , 2003, J. Mach. Learn. Res..