论文信息 - Hedged learning: regret-minimization with learning experts

Hedged learning: regret-minimization with learning experts

In non-cooperative multi-agent situations, there cannot exist a globally optimal, yet opponent-independent learning algorithm. Regret-minimization over a set of strategies optimized for potential opponent models is proposed as a good framework for deciding how to behave in such situations. Using longer playing horizons and experts that learn as they play, the regret-minimization framework can be extended to overcome several shortcomings of earlier approaches to the problem of multi-agent learning.

Leslie Pack Kaelbling | Yu-Han Chang | L. Kaelbling | Yu-Han Chang

[1] Nimrod Megiddo,et al. How to Combine Expert (and Novice) Advice when Actions Impact the Environment? , 2003, NIPS.

[2] Michael P. Wellman,et al. Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm , 1998, ICML.

[3] Leslie Pack Kaelbling,et al. Playing is believing: The role of beliefs in multi-agent learning , 2001, NIPS.

[4] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.

[5] D. Fudenberg,et al. Consistency and Cautious Fictitious Play , 1995 .

[6] John Nachbar,et al. Non-computable strategies and discounted repeated games , 1996 .

[7] Shie Mannor,et al. Adaptive Strategies and Regret Minimization in Arbitrarily Varying Markov Environments , 2001, COLT/EuroCOLT.

[8] Nicolò Cesa-Bianchi,et al. Gambling in a rigged casino: The adversarial multi-armed bandit problem , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[9] Michael L. Littman,et al. Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[10] Y. Freund,et al. Adaptive game playing using multiplicative weights , 1999 .