Adaptive game playing using multiplicative weights

Abstract We present a simple algorithm for playing a repeated game. We show that a player using this algorithm suffers average loss that is guaranteed to come close to the minimum loss achievable by any fixed strategy. Our bounds are nonasymptotic and hold for any opponent. The algorithm, which uses the multiplicative-weight methods of Littlestone and Warmuth, is analyzed using the Kullback–Liebler divergence. This analysis yields a new, simple proof of the min–max theorem, as well as a provable method of approximately solving a game. A variant of our game-playing algorithm is proved to be optimal in a very strong sense. Journal of Economic Literature Classification Numbers: C44, C70, D83.

[1]  Philip Wolfe,et al.  Contributions to the theory of games , 1953 .

[2]  Koichi Miyazawa David Blackwell, M.A. Girshick : Theory of Games and Statistical Decisions , 1955 .

[3]  D. Blackwell An analog of the minimax theorem for vector payoffs. , 1956 .

[4]  James Hannan,et al.  4. APPROXIMATION TO RAYES RISK IN REPEATED PLAY , 1958 .

[5]  Journal of the Association for Computing Machinery , 1961, Nature.

[6]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[7]  Gerald S. Rogers,et al.  Mathematical Statistics: A Decision Theoretic Approach , 1967 .

[8]  Jacob Ziv,et al.  Coding theorems for individual sequences , 1978, IEEE Trans. Inf. Theory.

[9]  Leslie G. Valiant,et al.  Fast probabilistic algorithms for hamiltonian circuits and matchings , 1977, STOC '77.

[10]  G. Owen,et al.  Game Theory (2nd Ed.). , 1983 .

[11]  A. P. Dawid,et al.  Present position and potential developments: some personal views , 1984 .

[12]  Manfred K. Warmuth,et al.  The weighted majority algorithm , 1989, 30th Annual Symposium on Foundations of Computer Science.

[13]  Vladimir Vovk,et al.  Aggregating strategies , 1990, COLT '90.

[14]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[15]  Neil D. Pearson,et al.  Consumption and Portfolio Policies With Incomplete Markets and Short‐Sale Constraints: the Finite‐Dimensional Case , 1991 .

[16]  Dean Phillips Foster Prediction in the Worst Case , 1991 .

[17]  Éva Tardos,et al.  Fast approximation algorithms for fractional packing and covering problems , 1991, [1991] Proceedings 32nd Annual Symposium of Foundations of Computer Science.

[18]  Neri Merhav,et al.  Universal prediction of individual sequences , 1992, IEEE Trans. Inf. Theory.

[19]  David Haussler,et al.  How to use expert advice , 1993, STOC.

[20]  Dean P. Foster,et al.  A Randomization Rule for Selecting Forecasts , 1993, Oper. Res..

[21]  Neal E. Young,et al.  Randomized rounding without solving the linear program , 1995, SODA '95.

[22]  Leonid Khachiyan,et al.  A sublinear-time randomized approximation algorithm for matrix games , 1995, Oper. Res. Lett..

[23]  Nicolò Cesa-Bianchi,et al.  Gambling in a rigged casino: The adversarial multi-armed bandit problem , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[24]  Manfred K. Warmuth,et al.  Additive versus exponentiated gradient updates for linear prediction , 1995, STOC '95.

[25]  D. Fudenberg,et al.  Consistency and Cautious Fictitious Play , 1995 .

[26]  Vladimir Vovk,et al.  A game of prediction with expert advice , 1995, COLT '95.

[27]  Erik Ordentlich,et al.  Universal portfolios with side information , 1996, IEEE Trans. Inf. Theory.

[28]  T. Cover Universal Portfolios , 1996 .

[29]  Yoav Freund,et al.  Game theory, on-line prediction and boosting , 1996, COLT '96.

[30]  Yoram Singer,et al.  On‐Line Portfolio Selection Using Multiplicative Updates , 1998, ICML.

[31]  Manfred K. Warmuth,et al.  Exponentiated Gradient Versus Gradient Descent for Linear Predictors , 1997, Inf. Comput..

[32]  Philip N. Klein,et al.  On the Number of Iterations for Dantzig-Wolfe Optimization and Packing-Covering Approximation Algorithms , 1999, SIAM J. Comput..

[33]  Dean P. Foster,et al.  Regret in the On-Line Decision Problem , 1999 .