Spectral Algorithms for Supervised Learning

We discuss how a large class of regularization methods, collectively known as spectral regularization and originally designed for solving ill-posed inverse problems, gives rise to regularized learning algorithms. All of these algorithms are consistent kernel methods that can be easily implemented. The intuition behind their derivation is that the same principle allowing for the numerical stabilization of a matrix inversion problem is crucial to avoid overfitting. The various methods have a common derivation but different computational and theoretical properties. We describe examples of such algorithms, analyze their classification performance on several data sets and discuss their applicability to real-world problems.

[1]  T. Poggio,et al.  STABILITY RESULTS IN LEARNING THEORY , 2005 .

[2]  Yuesheng Xu,et al.  Universal Kernels , 2006, J. Mach. Learn. Res..

[3]  André Elisseeff,et al.  Stability and Generalization , 2002, J. Mach. Learn. Res..

[4]  Grace Wahba,et al.  Spline Models for Observational Data , 1990 .

[5]  H. Engl,et al.  Regularization of Inverse Problems , 1996 .

[6]  Tomaso A. Poggio,et al.  Regularization Networks and Support Vector Machines , 2000, Adv. Comput. Math..

[7]  Alexander J. Smola,et al.  Kernels and Regularization on Graphs , 2003, COLT.

[8]  V. Vapnik Estimation of Dependences Based on Empirical Data , 2006 .

[9]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[10]  F. Girosi,et al.  Networks for approximation and learning , 1990, Proc. IEEE.

[11]  A. Caponnetto,et al.  Optimal Rates for the Regularized Least-Squares Algorithm , 2007, Found. Comput. Math..

[12]  Tomaso A. Poggio,et al.  Regularization Theory and Neural Networks Architectures , 1995, Neural Computation.

[13]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[14]  Bernhard Schölkopf,et al.  Learning Theory and Kernel Machines , 2003, Lecture Notes in Computer Science.

[15]  E. D. Vito,et al.  DISCRETIZATION ERROR ANALYSIS FOR TIKHONOV REGULARIZATION , 2006 .

[16]  Alexander J. Smola,et al.  Learning with non-positive kernels , 2004, ICML.

[17]  Lorenzo Rosasco,et al.  On regularization algorithms in learning theory , 2007, J. Complex..

[18]  Tong Zhang,et al.  Analysis of Spectral Kernel Design based Semi-supervised Learning , 2005, NIPS.

[19]  S. Smale,et al.  Shannon sampling and function reconstruction from point values , 2004 .

[20]  B. Yu,et al.  Boosting with the L_2-Loss: Regression and Classification , 2001 .

[21]  Lorenzo Rosasco,et al.  Model Selection for Regularized Least-Squares Algorithm in Learning Theory , 2005, Found. Comput. Math..

[22]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[23]  Lorenzo Rosasco,et al.  Learning from Examples as an Inverse Problem , 2005, J. Mach. Learn. Res..

[24]  Yiming Ying,et al.  Learning Rates of Least-Square Regularized Regression , 2006, Found. Comput. Math..

[25]  Bernhard Schölkopf,et al.  Cluster Kernels for Semi-Supervised Learning , 2002, NIPS.

[26]  Robert Tibshirani,et al.  The Entire Regularization Path for the Support Vector Machine , 2004, J. Mach. Learn. Res..

[27]  A. Verri,et al.  Spectral Methods for Regularization in Learning Theory , 2006 .

[28]  Massimiliano Pontil,et al.  Properties of Support Vector Machines , 1998, Neural Computation.

[29]  S. Smale,et al.  Learning Theory Estimates via Integral Operators and Their Approximations , 2007 .

[30]  Zoubin Ghahramani,et al.  Nonparametric Transforms of Graph Kernels for Semi-Supervised Learning , 2004, NIPS.

[31]  C. W. Groetsch,et al.  The theory of Tikhonov regularization for Fredholm equations of the first kind , 1984 .

[32]  Michael I. Jordan,et al.  Convexity, Classification, and Risk Bounds , 2006 .

[33]  A. Caponnetto Optimal Rates for Regularization Operators in Learning Theory , 2006 .

[34]  Mario Bertero,et al.  Introduction to Inverse Problems in Imaging , 1998 .

[35]  Gene H. Golub,et al.  Matrix computations (3rd ed.) , 1996 .

[36]  Vladimir Vapnik,et al.  Estimation of Dependences Based on Empirical Data: Springer Series in Statistics (Springer Series in Statistics) , 1982 .

[37]  Wolfgang Osten,et al.  Introduction to Inverse Problems in Imaging , 1999 .

[38]  A. N. Tikhonov,et al.  Solutions of ill-posed problems , 1977 .

[39]  Kwong-Sak Leung,et al.  Large-scale RLSC learning without agony , 2007, ICML '07.

[40]  P. Bühlmann,et al.  Boosting With the L2 Loss , 2003 .

[41]  S. Smale,et al.  Shannon sampling II: Connections to learning theory , 2005 .

[42]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .

[43]  T. Poggio,et al.  General conditions for predictivity in learning theory , 2004, Nature.

[44]  Gene H. Golub,et al.  Matrix computations , 1983 .

[45]  Y. Yao,et al.  On Early Stopping in Gradient Descent Learning , 2007 .

[46]  P. Bühlmann,et al.  Boosting with the L2-loss: regression and classification , 2001 .