Potential-Based Algorithms in On-Line Prediction and Game Theory

In this paper we show that several known algorithms for sequential prediction problems (including Weighted Majority and the quasi-additive family of Grove, Littlestone, and Schuurmans), for playing iterated games (including Freund and Schapire's Hedge and MW, as well as the Λ-strategies of Hart and Mas-Colell), and for boosting (including AdaBoost) are special cases of a general decision strategy based on the notion of potential. By analyzing this strategy we derive known performance bounds, as well as new bounds, as simple corollaries of a single general theorem. Besides offering a new and unified view on a large family of algorithms, we establish a connection between potential-based analysis in learning and their counterparts independently developed in game theory. By exploiting this connection, we show that certain learning problems are instances of more general game-theoretic problems. In particular, we describe a notion of generalized regret andshow its applications in learning theory.

[1]  D. Blackwell An analog of the minimax theorem for vector payoffs. , 1956 .

[2]  James Hannan,et al.  4. APPROXIMATION TO RAYES RISK IN REPEATED PLAY , 1958 .

[3]  A. A. Mullin,et al.  Principles of neurodynamics , 1962 .

[4]  H. D. Block The perceptron: a model for brain functioning. I , 1962 .

[5]  Albert B Novikoff,et al.  ON CONVERGENCE PROOFS FOR PERCEPTRONS , 1963 .

[6]  L. Bregman The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming , 1967 .

[7]  Valerie Isham,et al.  Non‐Negative Matrices and Markov Chains , 1983 .

[8]  Manfred K. Warmuth,et al.  The weighted majority algorithm , 1989, 30th Annual Symposium on Foundations of Computer Science.

[9]  Vladimir Vovk,et al.  Aggregating strategies , 1990, COLT '90.

[10]  N. Littlestone Mistake bounds and logarithmic linear-threshold learning algorithms , 1990 .

[11]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[12]  David Haussler,et al.  How to use expert advice , 1993, STOC.

[13]  D. Fudenberg,et al.  Consistency and Cautious Fictitious Play , 1995 .

[14]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[15]  Vladimir Vovk,et al.  A game of prediction with expert advice , 1995, COLT '95.

[16]  Dean P. Foster,et al.  Calibrated Learning and Correlated Equilibrium , 1997 .

[17]  Yoram Singer,et al.  Context-sensitive learning methods for text categorization , 1996, SIGIR '96.

[18]  Manfred K. Warmuth,et al.  How to use expert advice , 1997, JACM.

[19]  S. Hart,et al.  A simple adaptive procedure leading to correlated equilibrium , 2000 .

[20]  Yoram Singer,et al.  Using and combining predictors that specialize , 1997, STOC '97.

[21]  Dale Schuurmans,et al.  General Convergence Results for Linear Discriminant Updates , 1997, COLT '97.

[22]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[23]  David Haussler,et al.  Sequential Prediction of Individual Sequences Under General Loss Functions , 1998, IEEE Trans. Inf. Theory.

[24]  Claudio Gentile,et al.  Linear Hinge Loss and Average Margin , 1998, NIPS.

[25]  Yoav Freund,et al.  Large Margin Classification Using the Perceptron Algorithm , 1998, COLT.

[26]  Y. Freund,et al.  Adaptive game playing using multiplicative weights , 1999 .

[27]  Claudio Gentile,et al.  The Robustness of the p-Norm Algorithms , 1999, COLT '99.

[28]  Nicolò Cesa-Bianchi,et al.  Analysis of Two Gradient-Based Algorithms for On-Line Regression , 1999 .

[29]  D. Fudenberg,et al.  Conditional Universal Consistency , 1999 .

[30]  Manfred K. Warmuth,et al.  Averaging Expert Predictions , 1999, EuroCOLT.

[31]  S. Hart,et al.  A General Class of Adaptive Strategies , 1999 .

[32]  Dean P. Foster,et al.  Regret in the On-Line Decision Problem , 1999 .

[33]  V. Vovk Competitive On‐line Statistics , 2001 .

[34]  Claudio Gentile,et al.  Adaptive and Self-Confident On-Line Learning Algorithms , 2000, J. Comput. Syst. Sci..

[35]  Ehud Lehrer,et al.  A wide range no-regret theorem , 2003, Games Econ. Behav..

[36]  Manfred K. Warmuth,et al.  Relative Loss Bounds for Multidimensional Regression Problems , 1997, Machine Learning.

[37]  Robert E. Schapire,et al.  Drifting Games , 1999, COLT '99.

[38]  Claudio Gentile,et al.  A Second-Order Perceptron Algorithm , 2002, SIAM J. Comput..