An identity for kernel ridge regression

This paper derives an identity connecting the square loss of ridge regression in on-line mode with the loss of the retrospectively best regressor. Some corollaries about the properties of the cumulative loss of on-line ridge regression are also obtained.

[1]  Alfredo De Santis,et al.  Learning probabilistic prediction functions , 1988, [Proceedings 1988] 29th Annual Symposium on Foundations of Computer Science.

[2]  S. R. Searle,et al.  On Deriving the Inverse of a Sum of Matrices , 1981 .

[3]  Aldric L. Brown,et al.  Elements of Functional Analysis , 2014 .

[4]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[5]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[6]  Fedor Zhdanov,et al.  Prediction with Expert Advice under Discounted Loss , 2010, ALT.

[7]  Charles A. Micchelli,et al.  Learning the Kernel Function via Regularization , 2005, J. Mach. Learn. Res..

[8]  Philip M. Long,et al.  Worst-case quadratic loss bounds for prediction using linear functions and gradient descent , 1996, IEEE Trans. Neural Networks.

[9]  Bernhard Schölkopf,et al.  A Generalized Representer Theorem , 2001, COLT/EuroCOLT.

[10]  J. Lamperti Stochastic processes : a survey of the mathematical theory , 1979 .

[11]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[12]  Par N. Aronszajn La théorie des noyaux reproduisants et ses applications Première Partie , 1943, Mathematical Proceedings of the Cambridge Philosophical Society.

[13]  Vladimir Vovk,et al.  Aggregating strategies , 1990, COLT '90.

[14]  Vladimir Vovk,et al.  A game of prediction with expert advice , 1995, COLT '95.

[15]  Manfred K. Warmuth,et al.  Exponentiated Gradient Versus Gradient Descent for Linear Predictors , 1997, Inf. Comput..

[16]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[17]  William H. Press,et al.  Numerical Recipes 3rd Edition: The Art of Scientific Computing , 2007 .

[18]  Sham M. Kakade,et al.  Worst-Case Bounds for Gaussian Process Models , 2005, NIPS.

[19]  Manfred K. Warmuth,et al.  Relative Loss Bounds for On-Line Density Estimation with the Exponential Family of Distributions , 1999, Machine Learning.

[20]  Arthur E. Hoerl,et al.  Application of ridge analysis to regression problems , 1962 .

[21]  Sham M. Kakade,et al.  Information Consistency of Nonparametric Gaussian Process Methods , 2008, IEEE Transactions on Information Theory.

[22]  Philip M. Long,et al.  WORST-CASE QUADRATIC LOSS BOUNDS FOR ON-LINE PREDICTION OF LINEAR FUNCTIONS BY GRADIENT DESCENT , 1993 .

[23]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[24]  W. Press,et al.  Numerical Recipes: The Art of Scientific Computing , 1987 .

[25]  Alexander Gammerman,et al.  Ridge Regression Learning Algorithm in Dual Variables , 1998, ICML.

[26]  S. Axler Linear Algebra Done Right , 1995, Undergraduate Texts in Mathematics.

[27]  Mark Herbster,et al.  Tracking the Best Linear Predictor , 2001, J. Mach. Learn. Res..

[28]  V. Vovk Competitive On‐line Statistics , 2001 .

[29]  A. E. Hoerl,et al.  Ridge regression: biased estimation for nonorthogonal problems , 2000 .

[30]  D. Harville Matrix Algebra From a Statistician's Perspective , 1998 .

[31]  Claudio Gentile,et al.  A Second-Order Perceptron Algorithm , 2002, SIAM J. Comput..

[32]  Kei Takeuchi,et al.  MATHEMATICAL ENGINEERING TECHNICAL REPORTS Sequential Optimizing Strategy in Multi-dimensional Bounded Forecasting Games , 2009, 0911.3933.

[33]  Yuri Kalnishkan,et al.  Online Regression Competitive with Changing Predictors , 2007, ALT.