Competitive On‐line Statistics

A radically new approach to statistical modelling, which combines mathematical techniques of Bayesian statistics with the philosophy of the theory of competitive on‐line algorithms, has arisen over the last decade in computer science (to a large degree, under the influence of Dawid's prequential statistics). In this approach, which we call “competitive on‐line statistics”, it is not assumed that data are generated by some stochastic mechanism; the bounds derived for the performance of competitive on‐line statistical procedures are guaranteed to hold (and not just hold with high probability or on the average). This paper reviews some results in this area; the new material in it includes the proofs for the performance of the Aggregating Algorithm in the problem of linear regression with square loss.

[1]  R. F.,et al.  Mathematical Statistics , 1944, Nature.

[2]  Ray J. Solomonoff,et al.  A Formal Theory of Inductive Inference. Part I , 1964, Inf. Control..

[3]  Ray J. Solomonoff,et al.  A Formal Theory of Inductive Inference. Part II , 1964, Inf. Control..

[4]  C. S. Wallace,et al.  An Information Measure for Classification , 1968, Comput. J..

[5]  Yu. A. Gur'yan,et al.  Parts I and II , 1982 .

[6]  J. Rissanen A UNIVERSAL PRIOR FOR INTEGERS AND ESTIMATION BY MINIMUM DESCRIPTION LENGTH , 1983 .

[7]  A. P. Dawid,et al.  Present position and potential developments: some personal views , 1984 .

[8]  C. S. Wallace,et al.  Estimation and Inference by Compact Coding , 1987 .

[9]  Alfredo De Santis,et al.  Learning probabilistic prediction functions , 1988, [Proceedings 1988] 29th Annual Symposium on Foundations of Computer Science.

[10]  Andrew R. Barron,et al.  Information-theoretic asymptotics of Bayes methods , 1990, IEEE Trans. Inf. Theory.

[11]  Vladimir Vovk,et al.  Aggregating strategies , 1990, COLT '90.

[12]  Dean Phillips Foster Prediction in the Worst Case , 1991 .

[13]  A. Dawid Fisherian Inference in Likelihood and Prequential Frames of Reference , 1991 .

[14]  A. Barron,et al.  Jeffreys' prior is asymptotically least favorable under entropy risk , 1994 .

[15]  Vladimir Vovk,et al.  Universal Forecasting Algorithms , 1992, Inf. Comput..

[16]  V. Vovk A logic of probability, with application to the foundations of statistics , 1993 .

[17]  David Haussler,et al.  How to use expert advice , 1993, STOC.

[18]  Ming Li,et al.  An Introduction to Kolmogorov Complexity and Its Applications , 2019, Texts in Computer Science.

[19]  Philip M. Long,et al.  WORST-CASE QUADRATIC LOSS BOUNDS FOR ON-LINE PREDICTION OF LINEAR FUNCTIONS BY GRADIENT DESCENT , 1993 .

[20]  Manfred K. Warmuth,et al.  The Weighted Majority Algorithm , 1994, Inf. Comput..

[21]  Robert E. Schapire,et al.  Predicting Nearly As Well As the Best Pruning of a Decision Tree , 1995, COLT '95.

[22]  David Haussler,et al.  Tight worst-case loss bounds for predicting with expert advice , 1994, EuroCOLT.

[23]  Nicolò Cesa-Bianchi,et al.  Gambling in a rigged casino: The adversarial multi-armed bandit problem , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[24]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[25]  Vladimir Vovk,et al.  A game of prediction with expert advice , 1995, COLT '95.

[26]  Erik Ordentlich,et al.  Universal portfolios with side information , 1996, IEEE Trans. Inf. Theory.

[27]  Darrell D. E. Long,et al.  A dynamic disk spin-down technique for mobile computing , 1996, MobiCom '96.

[28]  Philip M. Long,et al.  Worst-case quadratic loss bounds for prediction using linear functions and gradient descent , 1996, IEEE Trans. Neural Networks.

[29]  T. Cover Universal Portfolios , 1996 .

[30]  Yoav Freund,et al.  Game theory, on-line prediction and boosting , 1996, COLT '96.

[31]  Yoav Freund,et al.  Predicting a binary sequence almost as well as the optimal biased coin , 2003, COLT '96.

[32]  Yoram Singer,et al.  Context-sensitive learning methods for text categorization , 1996, SIGIR '96.

[33]  A. Blum,et al.  Universal portfolios with and without transaction costs , 1997, COLT '97.

[34]  Vladimir Vovk,et al.  Competitive On-line Linear Regression , 1997, NIPS.

[35]  Vladimir Vovk,et al.  Derandomizing Stochastic Prediction Strategies , 1997, COLT '97.

[36]  Yoram Singer,et al.  Using and combining predictors that specialize , 1997, STOC '97.

[37]  Manfred K. Warmuth,et al.  Exponentiated Gradient Versus Gradient Descent for Linear Predictors , 1997, Inf. Comput..

[38]  Avrim Blum,et al.  On-line Learning and the Metrical Task System Problem , 1997, COLT '97.

[39]  David Haussler,et al.  Sequential Prediction of Individual Sequences Under General Loss Functions , 1998, IEEE Trans. Inf. Theory.

[40]  Kenji Yamanishi,et al.  A Decision-Theoretic Extension of Stochastic Complexity and Its Applications to Learning , 1998, IEEE Trans. Inf. Theory.

[41]  Nicolò Cesa-Bianchi,et al.  Finite-Time Regret Bounds for the Multiarmed Bandit Problem , 1998, ICML.

[42]  N. Cesa-Bianchi,et al.  On Bayes Methods for On-Line Boolean Prediction , 1998, Annual Conference Computational Learning Theory.

[43]  N. Draper,et al.  Applied Regression Analysis: Draper/Applied Regression Analysis , 1998 .

[44]  Andrew C. Singer,et al.  Universal data compression and linear prediction , 1998, Proceedings DCC '98 Data Compression Conference (Cat. No.98TB100225).

[45]  Peter Gr Unwald The minimum description length principle and reasoning under uncertainty , 1998 .

[46]  Mark Herbster,et al.  Tracking the best regressor , 1998, COLT' 98.

[47]  Vladimir Vovk,et al.  Universal portfolio selection , 1998, COLT' 98.

[48]  Kenji Yamanishi,et al.  Minimax relative loss analysis for sequential prediction algorithms using parametric hypotheses , 1998, COLT' 98.

[49]  Y. Freund,et al.  Adaptive game playing using multiplicative weights , 1999 .

[50]  Alexander Gammerman,et al.  Complexity Approximation Principle , 1999, Comput. J..

[51]  Yuri Kalnishkan,et al.  Genral Linear Relations among Different Types of Predictive Complexity , 1999, ALT.

[52]  Manfred K. Warmuth,et al.  Averaging Expert Predictions , 1999, EuroCOLT.

[53]  Jürgen Forster,et al.  On Relative Loss Bounds in Generalized Linear Regression , 1999, FCT.

[54]  Richard F. Gunst,et al.  Applied Regression Analysis , 1999, Technometrics.

[55]  Yuri Kalnishkan Linear relations between square-loss and Kolmogorov complexity , 1999, COLT '99.

[56]  V. V. V 'yugin Most Sequences Are Predictable , 1999 .

[57]  Andrew R. Barron,et al.  Asymptotic minimax regret for data compression, gambling, and prediction , 1997, IEEE Trans. Inf. Theory.

[58]  Yuri Kalnishkan,et al.  Complexity Approximation Principle and Rissanen's Approach to Real-Valued Parameters , 2000, ECML.

[59]  Vladimir Vovk Probability theory for the Brier game , 2001, Theor. Comput. Sci..

[60]  Vladimir Vovk,et al.  Predicting nearly as well as the best pruning of a decision tree through dynamic programming scheme , 2001, Theor. Comput. Sci..

[61]  G. Shafer,et al.  Probability and Finance: It's Only a Game! , 2001 .

[62]  William H. Press,et al.  Numerical recipes in C , 2002 .

[63]  Vladimir V. V'yugin Does snooping help? , 2002, Theor. Comput. Sci..

[64]  D. Pfeffermann,et al.  Small area estimation , 2011 .