The representer theorem for Hilbert spaces: a necessary and sufficient condition

The representer theorem is a property that lies at the foundation of regularization theory and kernel methods. A class of regularization functionals is said to admit a linear representer theorem if every member of the class admits minimizers that lie in the finite dimensional subspace spanned by the representers of the data. A recent characterization states that certain classes of regularization functionals with differentiable regularization term admit a linear representer theorem for any choice of the data if and only if the regularization term is a radial nondecreasing function. In this paper, we extend such result by weakening the assumptions on the regularization term. In particular, the main result of this paper implies that, for a sufficiently large family of regularization functionals, radial nondecreasing functions are the only lower semicontinuous regularization terms that guarantee existence of a representer theorem for any choice of the data.

[1]  F. Girosi,et al.  Networks for approximation and learning , 1990, Proc. IEEE.

[2]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[3]  Bernhard Schölkopf,et al.  A Generalized Representer Theorem , 2001, COLT/EuroCOLT.

[4]  Quoc V. Le,et al.  Learning to Rank with Nonsmooth Cost Functions , 2006, Neural Information Processing Systems.

[5]  R. Wijsman,et al.  Continuity of the Bayes Risk , 1970 .

[6]  Ellen M. Voorhees,et al.  TREC: Experiment and Evaluation in Information Retrieval (Digital Libraries and Electronic Publishing) , 2005 .

[7]  Filip Radlinski,et al.  A support vector method for optimizing average precision , 2007, SIGIR.

[8]  Mark D. Reid,et al.  Surrogate regret bounds for proper losses , 2009, ICML '09.

[9]  Eyke Hüllermeier,et al.  Bipartite Ranking through Minimization of Univariate Loss , 2011, ICML.

[10]  Stephen E. Robertson,et al.  SoftRank: optimizing non-smooth rank metrics , 2008, WSDM '08.

[11]  Alexander J. Smola,et al.  Learning with Kernels: support vector machines, regularization, optimization, and beyond , 2001, Adaptive computation and machine learning series.

[12]  Tie-Yan Liu,et al.  Learning to rank for information retrieval , 2009, SIGIR.

[13]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .

[14]  Eyke Hüllermeier,et al.  Label ranking by learning pairwise preferences , 2008, Artif. Intell..

[15]  Tie-Yan Liu,et al.  Directly optimizing evaluation measures in learning to rank , 2008, SIGIR.

[16]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[17]  G. Wahba,et al.  Some results on Tchebycheffian spline functions , 1971 .

[18]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[19]  Michael I. Jordan,et al.  On the Consistency of Ranking Algorithms , 2010, ICML.

[20]  Felipe Cucker,et al.  On the mathematical foundations of learning , 2001 .

[21]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[22]  Pradeep Ravikumar,et al.  On NDCG Consistency of Listwise Ranking Methods , 2011, AISTATS.

[23]  D. Cox,et al.  Asymptotic Analysis of Penalized Likelihood and Related Estimators , 1990 .

[24]  Tong Zhang,et al.  Statistical Analysis of Bayes Optimal Subset Ranking , 2008, IEEE Transactions on Information Theory.

[25]  Tie-Yan Liu,et al.  Adapting ranking SVM to document retrieval , 2006, SIGIR.

[26]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[27]  Yoram Singer,et al.  Log-Linear Models for Label Ranking , 2003, NIPS.

[28]  Charles A. Micchelli,et al.  When is there a representer theorem? Vector versus matrix regularizers , 2008, J. Mach. Learn. Res..

[29]  Cynthia Rudin,et al.  How to reverse-engineer quality rankings , 2012, Machine Learning.

[30]  Alexander J. Smola,et al.  Tighter Bounds for Structured Estimation , 2008, NIPS.

[31]  A. N. Tikhonov,et al.  Solutions of ill-posed problems , 1977 .

[32]  Patrick Gallinari,et al.  Learning Scoring Functions with Order-Preserving Losses and Standardized Supervision , 2011, ICML.

[33]  Ingo Steinwart How to Compare Different Loss Functions and Their Risks , 2007 .

[34]  Yi Chang,et al.  Yahoo! Learning to Rank Challenge Overview , 2010, Yahoo! Learning to Rank Challenge.

[35]  G. Wahba Spline models for observational data , 1990 .

[36]  Gábor Lugosi,et al.  Ranking and Scoring Using Empirical Risk Minimization , 2005, COLT.