Multi-Output Regularized Feature Projection

Dimensionality reduction by feature projection is widely used in pattern recognition, information retrieval, and statistics. When there are some outputs available (e.g., regression values or classification results), it is often beneficial to consider supervised projection, which is based not only on the inputs, but also on the target values. While this applies to a single-output setting, we are more interested in applications with multiple outputs, where several tasks need to be learned simultaneously. In this paper, we introduce a novel projection approach called multi-output regularized feature projection (MORP), which preserves the information of input features and, meanwhile, captures the correlations between inputs/outputs and (if applicable) between multiple outputs. This is done by introducing a latent variable model on the joint input-output space and minimizing the reconstruction errors for both inputs and outputs. It turns out that the mappings can be found by solving a generalized eigenvalue problem and are ready to extend to nonlinear mappings. Prediction accuracy can be greatly improved by using the new features since the structure of outputs is explored. We validate our approach in two applications. In the first setting, we predict users' preferences for a set of paintings. The second is concerned with image and text categorization where each image (or document) may belong to multiple categories. The proposed algorithm produces very encouraging results in both settings

[1]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[2]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[3]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[4]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[5]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[6]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[7]  William W. Cohen,et al.  Recommendation as Classification: Using Social and Content-Based Information in Recommendation , 1998, AAAI/IAAI.

[8]  J. Wade Davis,et al.  Statistical Pattern Recognition , 2003, Technometrics.

[9]  Bernhard Schölkopf,et al.  Kernel Dependency Estimation , 2002, NIPS.

[10]  Michael I. Jordan,et al.  Dimensionality Reduction for Supervised Learning with Reproducing Kernel Hilbert Spaces , 2004, J. Mach. Learn. Res..

[11]  A. N. Tikhonov,et al.  Solutions of ill-posed problems , 1977 .

[12]  Christian Böhm,et al.  Fast parallel similarity search in multimedia databases , 1997, SIGMOD '97.

[13]  Bernhard Schölkopf,et al.  Kernel Principal Component Analysis , 1997, ICANN.

[14]  Roman Rosipal,et al.  Kernel Partial Least Squares Regression in Reproducing Kernel Hilbert Space , 2002, J. Mach. Learn. Res..

[15]  Volker Tresp,et al.  A nonparametric hierarchical bayesian framework for information filtering , 2004, SIGIR '04.

[16]  Bernhard Schölkopf,et al.  Learning with kernels , 2001 .

[17]  Susan T. Dumais,et al.  Using Linear Algebra for Intelligent Information Retrieval , 1995, SIAM Rev..

[18]  Anton Schwaighofer,et al.  Hierarchical Bayesian modelling with Gaus-sian processes , 2005, NIPS 2005.

[19]  John Shawe-Taylor,et al.  Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.

[20]  Massimiliano Pontil,et al.  Regularized multi--task learning , 2004, KDD.

[21]  Yiming Yang,et al.  RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[22]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[23]  R. Tibshirani,et al.  Sparse Principal Component Analysis , 2006 .

[24]  David J. Kriegman,et al.  Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection , 1996, ECCV.

[25]  Gene H. Golub,et al.  Matrix computations , 1983 .

[26]  Bernhard Schölkopf,et al.  Kernel Principal Component Analysis , 1997, International Conference on Artificial Neural Networks.