Multi-output learning via spectral filtering

In this paper we study a class of regularized kernel methods for multi-output learning which are based on filtering the spectrum of the kernel matrix. The considered methods include Tikhonov regularization as a special case, as well as interesting alternatives such as vector-valued extensions of L2 boosting and other iterative schemes. Computational properties are discussed for various examples of kernels for vector-valued functions and the benefits of iterative techniques are illustrated. Generalizing previous results for the scalar case, we show a finite sample bound for the excess risk of the obtained estimator, which allows to prove consistency both for regression and multi-category classification. Finally, we present some promising results of the proposed algorithms on artificial and real data.

[1]  Alexander J. Smola,et al.  Kernels and Regularization on Graphs , 2003, COLT.

[2]  A. Caponnetto Optimal Rates for Regularization Operators in Learning Theory , 2006 .

[3]  Bernhard Schölkopf,et al.  Cluster Kernels for Semi-Supervised Learning , 2002, NIPS.

[4]  Lorenzo Rosasco,et al.  Adaptive Kernel Methods Using the Balancing Principle , 2010, Found. Comput. Math..

[5]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[6]  Charles A. Micchelli,et al.  On Learning Vector-Valued Functions , 2005, Neural Computation.

[7]  E. Fuselier Refined error estimates for matrix-valued radial basis functions , 2007 .

[8]  J. Friedman,et al.  Predicting Multivariate Responses in Multiple Linear Regression , 1997 .

[9]  Mark Brudnak Vector-Valued Support Vector Regression , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[10]  Bernhard Schölkopf,et al.  From Regularization Operators to Support Vector Kernels , 1997, NIPS.

[11]  Robert Tibshirani,et al.  The Entire Regularization Path for the Support Vector Machine , 2004, J. Mach. Learn. Res..

[12]  Edwin V. Bonilla,et al.  Multi-task Gaussian Process Prediction , 2007, NIPS.

[13]  Gökhan BakIr,et al.  Generalization Bounds and Consistency for Structured Labeling , 2007 .

[14]  Michael L. Stein,et al.  Interpolation of spatial data , 1999 .

[15]  S. Wold,et al.  The Collinearity Problem in Linear Regression. The Partial Least Squares (PLS) Approach to Generalized Inverses , 1984 .

[16]  Ingo Steinwart,et al.  On the Influence of the Kernel on the Consistency of Support Vector Machines , 2002, J. Mach. Learn. Res..

[17]  Lorenzo Rosasco,et al.  Learning from Examples as an Inverse Problem , 2005, J. Mach. Learn. Res..

[18]  F. J. Narcowich,et al.  Generalized Hermite interpolation via matrix-valued conditionally positive definite functions , 1994 .

[19]  Ambuj Tewari,et al.  On the Consistency of Multiclass Classification Methods , 2007, J. Mach. Learn. Res..

[20]  Massimiliano Pontil,et al.  An Algorithm for Transfer Learning in a Heterogeneous Environment , 2008, ECML/PKDD.

[21]  P. Bühlmann,et al.  Boosting With the L2 Loss , 2003 .

[22]  Thomas Hofmann,et al.  Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[23]  J. Zidek,et al.  Multivariate regression analysis and canonical variates , 1980 .

[24]  B. Yu,et al.  Boosting with the L_2-Loss: Regression and Classification , 2001 .

[25]  Massimiliano Pontil,et al.  Convex multi-task feature learning , 2008, Machine Learning.

[26]  Ji Zhu,et al.  Computing the Solution Path for the Regularized Support Vector Regression , 2005, NIPS.

[27]  Marcus R. Frean,et al.  Dependent Gaussian Processes , 2004, NIPS.

[28]  André Elisseeff,et al.  Stability and Generalization , 2002, J. Mach. Learn. Res..

[29]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[30]  W. M. Wan,et al.  The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD , 2011 .

[31]  Charles A. Micchelli,et al.  Kernels for Multi--task Learning , 2004, NIPS.

[32]  Yi Lin Multicategory Support Vector Machines, Theory, and Application to the Classification of . . . , 2003 .

[33]  C. Carmeli,et al.  VECTOR VALUED REPRODUCING KERNEL HILBERT SPACES OF INTEGRABLE FUNCTIONS AND MERCER THEOREM , 2006 .

[34]  Peter L. Bartlett,et al.  Multitask Learning with Expert Advice , 2007, COLT.

[35]  H. Engl,et al.  Regularization of Inverse Problems , 1996 .

[36]  Lorenzo Rosasco,et al.  Spectral Algorithms for Supervised Learning , 2008, Neural Computation.

[37]  Peter Auer,et al.  Proceedings of the 18th annual conference on Learning Theory , 2005 .

[38]  T. Poggio,et al.  General conditions for predictivity in learning theory , 2004, Nature.

[39]  R. Rifkin,et al.  Notes on Regularized Least Squares , 2007 .

[40]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[41]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[42]  Lorenzo Rosasco,et al.  Vector Field Learning via Spectral Filtering , 2010, ECML/PKDD.

[43]  Neil D. Lawrence,et al.  Latent Force Models , 2009, AISTATS.

[44]  Claudio Gentile,et al.  Proceedings of the 20th annual conference on Learning theory , 2007 .

[45]  Svenja Lowitzsch,et al.  A density theorem for matrix-valued radial basis functions , 2005, Numerical Algorithms.

[46]  G. Griffin,et al.  Caltech-256 Object Category Dataset , 2007 .

[47]  Michael I. Jordan,et al.  Multi-task feature selection , 2006 .

[48]  A. Caponnetto,et al.  Optimal Rates for the Regularized Least-Squares Algorithm , 2007, Found. Comput. Math..

[49]  O. Bousquet,et al.  Kernels, Associated Structures and Generalizations , 2004 .

[50]  Tom Heskes,et al.  Task Clustering and Gating for Bayesian Multitask Learning , 2003, J. Mach. Learn. Res..

[51]  Daniel Sheldon,et al.  Graphical Multi-Task Learning , 2008 .

[52]  E. D. Vito,et al.  Risk Bounds for Regularized Least-squares Algorithm with Operator-valued kernels , 2005 .

[53]  Lorenzo Rosasco,et al.  On regularization algorithms in learning theory , 2007, J. Complex..

[54]  David J. Fleet,et al.  Performance of optical flow techniques , 1994, International Journal of Computer Vision.

[55]  Jean-Philippe Vert,et al.  Clustered Multi-Task Learning: A Convex Formulation , 2008, NIPS.

[56]  Charles A. Micchelli,et al.  Learning Multiple Tasks with Kernel Methods , 2005, J. Mach. Learn. Res..

[57]  Gökhan BakIr,et al.  Predicting Structured Data , 2008 .

[58]  Joaquin Vanschoren,et al.  EUROPEAN CONFERENCE ON MACHINE LEARNING AND PRINCIPLES AND PRACTICE OF KNOWLEDGE DISCOVERY IN DATABASES , 2012 .

[59]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[60]  Sethu Vijayakumar,et al.  Multi-task Gaussian Process Learning of Robot Inverse Dynamics , 2008, NIPS.

[61]  Andrew Zisserman,et al.  Advances in Neural Information Processing Systems (NIPS) , 2007 .

[62]  Andreas Krause,et al.  Advances in Neural Information Processing Systems (NIPS) , 2014 .

[63]  S. Yau Mathematics and its applications , 2002 .

[64]  Francis R. Bach,et al.  A New Approach to Collaborative Filtering: Operator Estimation with Spectral Regularization , 2008, J. Mach. Learn. Res..

[65]  P. Mathé,et al.  MODULI OF CONTINUITY FOR OPERATOR VALUED FUNCTIONS , 2002 .

[66]  Andreas Ziehe,et al.  Estimating vector fields using sparse basis field expansions , 2008, NIPS.

[67]  A. Izenman Reduced-rank regression for the multivariate linear model , 1975 .

[68]  Ferdinando A. Mussa-Ivaldi,et al.  From basis functions to basis fields: vector field approximation from sparse data , 1992, Biological Cybernetics.

[69]  Ryan M. Rifkin,et al.  In Defense of One-Vs-All Classification , 2004, J. Mach. Learn. Res..

[70]  Tong Zhang,et al.  Statistical Analysis of Some Multi-Category Large Margin Classification Methods , 2004, J. Mach. Learn. Res..

[71]  Charles A. Micchelli,et al.  Universal Multi-Task Kernels , 2008, J. Mach. Learn. Res..

[72]  Thomas G. Dietterich,et al.  Solving Multiclass Learning Problems via Error-Correcting Output Codes , 1994, J. Artif. Intell. Res..

[73]  Laurent Schwartz,et al.  Sous-espaces hilbertiens d’espaces vectoriels topologiques et noyaux associés (Noyaux reproduisants) , 1964 .

[74]  Gang Wang,et al.  A New Solution Path Algorithm in Support Vector Regression , 2008, IEEE Transactions on Neural Networks.