BL-WoLF: A Framework For Loss-Bounded Learnability In Zero-Sum Games

We present BL-WoLF, a framework for learnability in repeated zero-sum games where the cost of learning is measured by the losses the learning agent accrues (rather than the number of rounds). The game is adversarially chosen from some family that the learner knows. The opponent knows the game and the learner's learning strategy. The learner tries to either not accrue losses, or to quickly learn about the game so as to avoid future losses (this is consistent with the Win or Learn Fast (WoLF) principle; BL stands for "bounded loss"). Our framework allows for both probabilistic and approximate learning. The resultant notion of BL-WoLF-learnability can be applied to any class of games, and allows us to measure the inherent disadvantage to a player that does not know which game in the class it is in. We present guaranteed BL-WoLF-learnability results for families of games with deterministic payoffs and families of games with stochastic payoffs. We demonstrate that these families are guaranteed approximately BL-WoLF-learnable with lower cost. We then demonstrate families of games (both stochastic and deterministic) that are not guaranteed BL-WoLF-learnable. We show that those families, nevertheless, are BL-WoLF-learnable . To prove these results, we use a key lemma which we derive.

[1]  Peter Stone,et al.  A polynomial-time nash equilibrium algorithm for repeated games , 2003, EC '03.

[2]  James Hannan,et al.  4. APPROXIMATION TO RAYES RISK IN REPEATED PLAY , 1958 .

[3]  Dean P. Foster,et al.  A Randomization Rule for Selecting Forecasts , 1993, Oper. Res..

[4]  A. Banos On Pseudo-Games , 1968 .

[5]  Yishay Mansour,et al.  Nash Convergence of Gradient Dynamics in General-Sum Games , 2000, UAI.

[6]  Y. Freund,et al.  Adaptive game playing using multiplicative weights , 1999 .

[7]  N. Megiddo On repeated games with incomplete information played by non-Bayesian players , 1980 .

[8]  Christos H. Papadimitriou,et al.  Algorithms, games, and the internet , 2001, STOC '01.

[9]  D. Fudenberg,et al.  Consistency and Cautious Fictitious Play , 1995 .

[10]  Howard Raiffa,et al.  Games And Decisions , 1958 .

[11]  Vincent Conitzer,et al.  Complexity Results about Nash Equilibria , 2002, IJCAI.

[12]  Dov Samet,et al.  Learning to play games in extensive form by valuation , 2001, J. Econ. Theory.

[13]  Ronen I. Brafman,et al.  A near-optimal polynomial time algorithm for learning in certain classes of stochastic games , 2000, Artif. Intell..

[14]  Moshe Tennenholtz,et al.  Dynamic Non-Bayesian Decision Making , 1997, J. Artif. Intell. Res..

[15]  David Haussler,et al.  How to use expert advice , 1993, STOC.

[16]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[17]  E. Kalai,et al.  Rational Learning Leads to Nash Equilibrium , 1993 .

[18]  Nicolò Cesa-Bianchi,et al.  Gambling in a rigged casino: The adversarial multi-armed bandit problem , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[19]  Michael Kearns,et al.  Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.

[20]  D. Fudenberg,et al.  The Theory of Learning in Games , 1998 .

[21]  Ronen I. Brafman,et al.  Efficient learning equilibrium , 2004, Artificial Intelligence.

[22]  Manuela M. Veloso,et al.  Multiagent learning using a variable learning rate , 2002, Artif. Intell..

[23]  Michael P. Wellman,et al.  Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm , 1998, ICML.