论文信息 - Dimensionality Reduction for Supervised Learning with Reproducing Kernel Hilbert Spaces

Dimensionality Reduction for Supervised Learning with Reproducing Kernel Hilbert Spaces

We propose a novel method of dimensionality reduction for supervised learning problems. Given a regression or classification problem in which we wish to predict a response variable Y from an explanatory variable X , we treat the problem of dimensionality reduction as that of finding a low-dimensional “effective subspace” of X which retains the statistical relationship between X and Y . We show that this problem can be formulated in terms of conditional independence. To turn this formulation into an optimization problem we establish a general nonparametric characterization of conditional independence using covariance operators on a reproducing kernel Hilbert space. This characterization allows us to derive a contrast function for estimation of the effective subspace. Unlike many conventional methods for dimensionality reduction in supervised learning, the proposed method requires neither assumptions on the marginal distribution of X , nor a parametric model of the conditional distribution of Y . We present experiments that compare the performance of the method with conventional methods.

Michael I. Jordan | Kenji Fukumizu | Francis R. Bach | F. Bach | K. Fukumizu

[1] N. Aronszajn. Theory of Reproducing Kernels. , 1950 .

[2] C. Baker. Joint measures and cross-covariance operators , 1973 .

[3] D. Rubinfeld,et al. Hedonic housing prices and the demand for clean air , 1978 .

[4] J. Friedman,et al. Projection Pursuit Regression , 1981 .

[5] J. Friedman,et al. Estimating Optimal Transformations for Multiple Regression and Correlation. , 1985 .

[6] I. Helland. ON THE STRUCTURE OF PARTIAL LEAST SQUARES REGRESSION , 1988 .

[7] A. Höskuldsson. PLS regression methods , 1988 .

[8] Ker-Chau Li,et al. Sliced Inverse Regression for Dimension Reduction , 1991 .

[9] Bernhard E. Boser,et al. A training algorithm for optimal margin classifiers , 1992, COLT '92.

[10] Ker-Chau Li,et al. On Principal Hessian Directions for Data Visualization and Dimension Reduction: Another Application of Stein's Lemma , 1992 .

[11] A. Samarov. Exploring Regression Structure Using Nonparametric Functional Estimation , 1993 .

[12] Heekuck Oh,et al. Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[13] R. Cook. Regression Graphics , 1994 .

[14] Geoffrey E. Hinton,et al. Bayesian Learning for Neural Networks , 1995 .

[15] Alexander J. Smola,et al. Support Vector Method for Function Approximation, Regression Estimation and Signal Processing , 1996, NIPS.

[16] Christopher K. I. Williams,et al. Discovering Hidden Features with Gaussian Processes Regression , 1998, NIPS.

[17] Bernhard Schölkopf,et al. Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[18] Catherine Blake,et al. UCI Repository of machine learning databases , 1998 .

[19] Ker-Chau Li,et al. Interactive Tree-Structured Regression via Principal Hessian Directions , 2000 .

[20] J. Polzehl,et al. Structure adaptive approach for dimension reduction , 2001 .

[21] R. Cook,et al. Theory & Methods: Special Invited Paper: Dimension Reduction and Visualization in Discriminant Analysis (with discussion) , 2001 .

[22] S. Weisberg. Dimension Reduction Regression in R , 2002 .

[23] W. Fung,et al. DIMENSION REDUCTION BASED ON CANONICAL CORRELATION , 2002 .

[24] Michael I. Jordan,et al. Learning Graphical Models with Mercer Kernels , 2002, NIPS.

[25] Michael I. Jordan,et al. Tree-dependent Component Analysis , 2002, UAI.

[26] Danh V. Nguyen,et al. Tumor classification by partial least squares using microarray gene expression data , 2002, Bioinform..

[27] D. Alpay. The Schur algorithm, reproducing kernel spaces and system theory , 2002 .

[28] Kari Torkkola,et al. Feature Extraction by Non-Parametric Mutual Information Maximization , 2003, J. Mach. Learn. Res..

[29] Michael I. Jordan,et al. Kernel independent component analysis , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[30] Isabelle Guyon,et al. An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[31] Michael I. Jordan,et al. Beyond Independent Components: Trees and Clusters , 2003, J. Mach. Learn. Res..