论文信息 - An identity for kernel ridge regression

An identity for kernel ridge regression

This paper derives an identity connecting the square loss of ridge regression in on-line mode with the loss of the retrospectively best regressor. Some corollaries about the properties of the cumulative loss of on-line ridge regression are also obtained.

Yuri Kalnishkan | Fedor Zhdanov | Yuri Kalnishkan | Fedor Zhdanov

[1] Alfredo De Santis,et al. Learning probabilistic prediction functions , 1988, [Proceedings 1988] 29th Annual Symposium on Foundations of Computer Science.

[2] S. R. Searle,et al. On Deriving the Inverse of a Sum of Matrices , 1981 .

[3] Aldric L. Brown,et al. Elements of Functional Analysis , 2014 .

[4] Charles R. Johnson,et al. Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[5] Carl E. Rasmussen,et al. Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[6] Fedor Zhdanov,et al. Prediction with Expert Advice under Discounted Loss , 2010, ALT.

[7] Charles A. Micchelli,et al. Learning the Kernel Function via Regularization , 2005, J. Mach. Learn. Res..

[8] Philip M. Long,et al. Worst-case quadratic loss bounds for prediction using linear functions and gradient descent , 1996, IEEE Trans. Neural Networks.

[9] Bernhard Schölkopf,et al. A Generalized Representer Theorem , 2001, COLT/EuroCOLT.

[10] J. Lamperti. Stochastic processes : a survey of the mathematical theory , 1979 .

[11] Christopher M. Bishop,et al. Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[12] Par N. Aronszajn. La théorie des noyaux reproduisants et ses applications Première Partie , 1943, Mathematical Proceedings of the Cambridge Philosophical Society.

[13] Vladimir Vovk,et al. Aggregating strategies , 1990, COLT '90.

[14] Vladimir Vovk,et al. A game of prediction with expert advice , 1995, COLT '95.

[15] Manfred K. Warmuth,et al. Exponentiated Gradient Versus Gradient Descent for Linear Predictors , 1997, Inf. Comput..

[16] Gábor Lugosi,et al. Prediction, learning, and games , 2006 .

[17] William H. Press,et al. Numerical Recipes 3rd Edition: The Art of Scientific Computing , 2007 .

[18] Sham M. Kakade,et al. Worst-Case Bounds for Gaussian Process Models , 2005, NIPS.

[19] Manfred K. Warmuth,et al. Relative Loss Bounds for On-Line Density Estimation with the Exponential Family of Distributions , 1999, Machine Learning.

[20] Arthur E. Hoerl,et al. Application of ridge analysis to regression problems , 1962 .

[21] Sham M. Kakade,et al. Information Consistency of Nonparametric Gaussian Process Methods , 2008, IEEE Transactions on Information Theory.

[22] Philip M. Long,et al. WORST-CASE QUADRATIC LOSS BOUNDS FOR ON-LINE PREDICTION OF LINEAR FUNCTIONS BY GRADIENT DESCENT , 1993 .

[23] Nasser M. Nasrabadi,et al. Pattern Recognition and Machine Learning , 2006, Technometrics.

[24] W. Press,et al. Numerical Recipes: The Art of Scientific Computing , 1987 .

[25] Alexander Gammerman,et al. Ridge Regression Learning Algorithm in Dual Variables , 1998, ICML.

[26] S. Axler. Linear Algebra Done Right , 1995, Undergraduate Texts in Mathematics.

[27] Mark Herbster,et al. Tracking the Best Linear Predictor , 2001, J. Mach. Learn. Res..

[28] V. Vovk. Competitive On‐line Statistics , 2001 .

[29] A. E. Hoerl,et al. Ridge regression: biased estimation for nonorthogonal problems , 2000 .

[30] D. Harville. Matrix Algebra From a Statistician's Perspective , 1998 .

[31] Claudio Gentile,et al. A Second-Order Perceptron Algorithm , 2002, SIAM J. Comput..

[32] Kei Takeuchi,et al. MATHEMATICAL ENGINEERING TECHNICAL REPORTS Sequential Optimizing Strategy in Multi-dimensional Bounded Forecasting Games , 2009, 0911.3933.

[33] Yuri Kalnishkan,et al. Online Regression Competitive with Changing Predictors , 2007, ALT.