Potential-Based Algorithms in On-Line Prediction and Game Theory ∗

In this paper we show that several known algorithms for sequential prediction problems (including Weighted Majority and the quasi-additive family of Grove, Littlestone, and Schuurmans), for playing iterated games (including Freund and Schapire’s Hedge and MW, as well as the -strategies of Hart and Mas-Colell), and for boosting (including AdaBoost) are special cases of a general decision strategy based on the notion of potential. By analyzing this strategy we derive known performance bounds, as well as new bounds, as simple corollaries of a single general theorem. Besides offering a new and unified view on a large family of algorithms, we establish a connection between potential-based analysis in learning and their counterparts independently developed in game theory. By exploiting this connection, we show that certain learning problems are instances of more general gametheoretic problems. In particular, we describe a notion of generalized regret and show its applications in learning theory.

[1]  D. Blackwell An analog of the minimax theorem for vector payoffs. , 1956 .

[2]  James Hannan,et al.  4. APPROXIMATION TO RAYES RISK IN REPEATED PLAY , 1958 .

[3]  A. A. Mullin,et al.  Principles of neurodynamics , 1962 .

[4]  H. D. Block The perceptron: a model for brain functioning. I , 1962 .

[5]  Albert B Novikoff,et al.  ON CONVERGENCE PROOFS FOR PERCEPTRONS , 1963 .

[6]  L. Bregman The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming , 1967 .

[7]  Manfred K. Warmuth,et al.  The weighted majority algorithm , 1989, 30th Annual Symposium on Foundations of Computer Science.

[8]  Vladimir Vovk,et al.  Aggregating strategies , 1990, COLT '90.

[9]  N. Littlestone Mistake bounds and logarithmic linear-threshold learning algorithms , 1990 .

[10]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[11]  David Haussler,et al.  How to use expert advice , 1993, STOC.

[12]  D. Fudenberg,et al.  Consistency and Cautious Fictitious Play , 1995 .

[13]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[14]  Vladimir Vovk,et al.  A game of prediction with expert advice , 1995, COLT '95.

[15]  Dean P. Foster,et al.  Calibrated Learning and Correlated Equilibrium , 1997 .

[16]  Yoram Singer,et al.  Context-sensitive learning methods for text categorization , 1996, SIGIR '96.

[17]  Manfred K. Warmuth,et al.  How to use expert advice , 1997, JACM.

[18]  S. Hart,et al.  A simple adaptive procedure leading to correlated equilibrium , 2000 .

[19]  Yoram Singer,et al.  Using and combining predictors that specialize , 1997, STOC '97.

[20]  Dale Schuurmans,et al.  General Convergence Results for Linear Discriminant Updates , 1997, COLT '97.

[21]  Nicolò Cesa-Bianchi,et al.  Analysis of two gradient-based algorithms for on-line regression , 1997, COLT '97.

[22]  David Haussler,et al.  Sequential Prediction of Individual Sequences Under General Loss Functions , 1998, IEEE Trans. Inf. Theory.

[23]  Claudio Gentile,et al.  Linear Hinge Loss and Average Margin , 1998, NIPS.

[24]  Yoav Freund,et al.  Large Margin Classification Using the Perceptron Algorithm , 1998, COLT.

[25]  Y. Freund,et al.  Adaptive game playing using multiplicative weights , 1999 .

[26]  Claudio Gentile,et al.  The Robustness of the p-Norm Algorithms , 1999, COLT '99.

[27]  D. Fudenberg,et al.  Conditional Universal Consistency , 1999 .

[28]  Manfred K. Warmuth,et al.  Averaging Expert Predictions , 1999, EuroCOLT.

[29]  S. Hart,et al.  A General Class of Adaptive Strategies , 1999 .

[30]  Dean P. Foster,et al.  Regret in the On-Line Decision Problem , 1999 .

[31]  V. Vovk Competitive On‐line Statistics , 2001 .

[32]  Claudio Gentile,et al.  Adaptive and Self-Confident On-Line Learning Algorithms , 2000, J. Comput. Syst. Sci..

[33]  Ehud Lehrer,et al.  A wide range no-regret theorem , 2003, Games Econ. Behav..