An Identity for Kernel Ridge Regression

This paper provides a probabilistic derivation of an identity connecting the square loss of ridge regression in on-line mode with the loss of a retrospectively best regressor. Some corollaries of the identity providing upper bounds for the cumulative loss of on-line ridge regression are also discussed.

[1]  Kei Takeuchi,et al.  MATHEMATICAL ENGINEERING TECHNICAL REPORTS Sequential Optimizing Strategy in Multi-dimensional Bounded Forecasting Games , 2009, 0911.3933.

[2]  William H. Press,et al.  Numerical Recipes: The Art of Scientific Computing , 1987 .

[3]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[4]  Sham M. Kakade,et al.  Information Consistency of Nonparametric Gaussian Process Methods , 2008, IEEE Transactions on Information Theory.

[5]  R. Jackson Inequalities , 2007, Algebra for Parents.

[6]  Fedor Zhdanov,et al.  Prediction with Expert Advice under Discounted Loss , 2010, ALT.

[7]  Sham M. Kakade,et al.  Worst-Case Bounds for Gaussian Process Models , 2005, NIPS.

[8]  S. R. Searle,et al.  On Deriving the Inverse of a Sum of Matrices , 1981 .

[9]  J. Lamperti Stochastic processes : a survey of the mathematical theory , 1979 .

[10]  Alexander Gammerman,et al.  Ridge Regression Learning Algorithm in Dual Variables , 1998, ICML.

[11]  Par N. Aronszajn La théorie des noyaux reproduisants et ses applications Première Partie , 1943, Mathematical Proceedings of the Cambridge Philosophical Society.

[12]  Alfredo De Santis,et al.  Learning probabilistic prediction functions , 1988, [Proceedings 1988] 29th Annual Symposium on Foundations of Computer Science.

[13]  Manfred K. Warmuth,et al.  Exponentiated Gradient Versus Gradient Descent for Linear Predictors , 1997, Inf. Comput..

[14]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[15]  Philip M. Long,et al.  WORST-CASE QUADRATIC LOSS BOUNDS FOR ON-LINE PREDICTION OF LINEAR FUNCTIONS BY GRADIENT DESCENT , 1993 .

[16]  Arthur E. Hoerl,et al.  Application of ridge analysis to regression problems , 1962 .

[17]  Vladimir Vovk,et al.  Aggregating strategies , 1990, COLT '90.

[18]  Vladimir Vovk,et al.  A game of prediction with expert advice , 1995, COLT '95.

[19]  Manfred K. Warmuth,et al.  Relative Loss Bounds for On-Line Density Estimation with the Exponential Family of Distributions , 1999, Machine Learning.

[20]  D. Harville Matrix Algebra From a Statistician's Perspective , 1998 .

[21]  Claudio Gentile,et al.  A Second-Order Perceptron Algorithm , 2002, SIAM J. Comput..

[22]  Yuri Kalnishkan,et al.  Online Regression Competitive with Changing Predictors , 2007, ALT.

[23]  Bernhard Schölkopf,et al.  A Generalized Representer Theorem , 2001, COLT/EuroCOLT.

[24]  Arthur E. Hoerl,et al.  Ridge Regression: Biased Estimation for Nonorthogonal Problems , 2000, Technometrics.

[25]  S. Axler Linear Algebra Done Right , 1995, Undergraduate Texts in Mathematics.

[26]  Mark Herbster,et al.  Tracking the Best Linear Predictor , 2001, J. Mach. Learn. Res..

[27]  V. Vovk Competitive On‐line Statistics , 2001 .

[28]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[29]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[30]  Philip M. Long,et al.  Worst-case quadratic loss bounds for prediction using linear functions and gradient descent , 1996, IEEE Trans. Neural Networks.

[31]  Charles A. Micchelli,et al.  Learning the Kernel Function via Regularization , 2005, J. Mach. Learn. Res..

[32]  Vladimir Vovk,et al.  Competing with Gaussian linear experts , 2009, ArXiv.