Leading Strategies in Competitive On-Line Prediction

We start from a simple asymptotic result for the problem of on-line regression with the quadratic loss function: the class of continuous limited-memory prediction strategies admits a “leading prediction strategy”, which not only asymptotically performs at least as well as any continuous limited-memory strategy but also satisfies the property that the excess loss of any continuous limited-memory strategy is determined by how closely it imitates the leading strategy. More specifically, for any class of prediction strategies constituting a reproducing kernel Hilbert space we construct a leading strategy, in the sense that the loss of any prediction strategy whose norm is not too large is determined by how closely it imitates the leading strategy. This result is extended to the loss functions given by Bregman divergences and by strictly proper scoring rules.

[1]  Vladimir Vovk Probability theory for the Brier game , 2001, Theor. Comput. Sci..

[2]  Claudio Gentile,et al.  Adaptive and Self-Confident On-Line Learning Algorithms , 2000, J. Comput. Syst. Sci..

[3]  Vladimir Vovk,et al.  Competing with Stationary Prediction Strategies , 2006, COLT.

[4]  Par N. Aronszajn La théorie des noyaux reproduisants et ses applications Première Partie , 1943, Mathematical Proceedings of the Cambridge Philosophical Society.

[5]  A. Dawid Calibration-Based Empirical Probability , 1985 .

[6]  Vladimir Vovk,et al.  Predictions as Statements and Decisions , 2006, COLT.

[7]  Vladimir Vovk Defensive Prediction with Expert Advice , 2005, ALT.

[8]  Manfred K. Warmuth,et al.  Relative Loss Bounds for On-Line Density Estimation with the Exponential Family of Distributions , 1999, Machine Learning.

[9]  Ray J. Solomonoff,et al.  Complexity-based induction systems: Comparisons and convergence theorems , 1978, IEEE Trans. Inf. Theory.

[10]  Manfred K. Warmuth,et al.  Relative loss bounds for single neurons , 1999, IEEE Trans. Neural Networks.

[11]  D. Blackwell,et al.  Merging of Opinions with Increasing Information , 1962 .

[12]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .

[13]  M. Kendall Theoretical Statistics , 1956, Nature.

[14]  Jean-Luc Ville Étude critique de la notion de collectif , 1939 .

[15]  Vladimir Vovk,et al.  Competing with Markov prediction strategies , 2006, ArXiv.

[16]  A. P. Dawid,et al.  Probability, Causality and the Empirical World: A Bayes-de Finetti-Popper-Borel Synthesis , 2004 .

[17]  Vladimir Vovk Non-asymptotic calibration and resolution , 2007, Theor. Comput. Sci..

[18]  Per Martin-Löf,et al.  The Definition of Random Sequences , 1966, Inf. Control..

[19]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[20]  R Š Lipcer,et al.  ON THE QUESTION OF ABSOLUTE CONTINUITY AND SINGULARITY OF PROBABILITY MEASURES , 1977 .

[21]  Vladimir Vovk Competing with Wild Prediction Rules , 2006, COLT.

[22]  Vladimir Vovk,et al.  On-Line Regression Competitive with Reproducing Kernel Hilbert Spaces , 2005, TAMC.

[23]  Philip M. Long,et al.  WORST-CASE QUADRATIC LOSS BOUNDS FOR ON-LINE PREDICTION OF LINEAR FUNCTIONS BY GRADIENT DESCENT , 1993 .

[24]  Vladimir Vovk,et al.  Competing with wild prediction rules , 2005, Machine Learning.

[25]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[26]  L. Bregman The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming , 1967 .

[27]  Philip M. Long,et al.  Worst-case quadratic loss bounds for prediction using linear functions and gradient descent , 1996, IEEE Trans. Neural Networks.

[28]  G. Shafer,et al.  Probability and Finance: It's Only a Game! , 2001 .

[29]  Akimichi Takemura,et al.  Defensive Forecasting , 2005, AISTATS.

[30]  Vladimir Vovk Competitive on-line learning with a convex loss function , 2005, ArXiv.

[31]  A. P. Dawid,et al.  Present position and potential developments: some personal views , 1984 .

[32]  C. Schnorr Zufälligkeit und Wahrscheinlichkeit , 1971 .

[33]  Manfred K. Warmuth,et al.  Relative Loss Bounds for Multidimensional Regression Problems , 1997, Machine Learning.

[34]  Akimichi Takemura,et al.  Defensive Forecasting for Linear Protocols , 2005, ALT.

[35]  Don R. Hush,et al.  Function Classes That Approximate the Bayes Risk , 2006, COLT.

[36]  Mark Herbster,et al.  Tracking the Best Linear Predictor , 2001, J. Mach. Learn. Res..