论文信息 - A General Framework for Interacting Bayes-Optimally with Self-Interested Agents using Arbitrary Parametric Model and Model Prior - 字舞流文

A General Framework for Interacting Bayes-Optimally with Self-Interested Agents using Arbitrary Parametric Model and Model Prior

Recent advances in Bayesian reinforcement learning (BRL) have shown that Bayes-optimality is theoretically achievable by modeling the environment's latent dynamics using Flat-Dirichlet-Multinomial (FDM) prior. In self-interested multi-agent environments, the transition dynamics are mainly controlled by the other agent's stochastic behavior for which FDM's independence and modeling assumptions do not hold. As a result, FDM does not allow the other agent's behavior to be generalized across different states nor specified using prior domain knowledge. To overcome these practical limitations of FDM, we propose a generalization of BRL to integrate the general class of parametric models and model priors, thus allowing practitioners' domain knowledge to be exploited to produce a fine-grained and compact representation of the other agent's behavior. Empirical evaluation shows that our approach outperforms existing multi-agent reinforcement learning algorithms.

Kian Hsiang Low | Trong Nghia Hoang | K. H. Low | T. Hoang

[1] Kian Hsiang Low,et al. Multi-robot informative path planning for active sensing of environmental phenomena: a tale of two algorithms , 2013, AAMAS.

[2] Stuart J. Russell,et al. Bayesian Q-Learning , 1998, AAAI/IAAI.

[3] Kian Hsiang Low,et al. Adaptive Sampling for Multi-Robot Wide-Area Exploration , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[4] Kian Hsiang Low,et al. Decentralized active robotic exploration and mapping for probabilistic field classification in environmental sensing , 2012, AAMAS.

[5] Joelle Pineau,et al. Point-based value iteration: An anytime algorithm for POMDPs , 2003, IJCAI.

[6] Mohan S. Kankanhalli,et al. Decision-theoretic coordination and control for active multi-camera surveillance in uncertain, partially observable environments , 2012, 2012 Sixth International Conference on Distributed Smart Cameras (ICDSC).

[7] Akira Hayashi,et al. A multiagent reinforcement learning algorithm using extended optimal response , 2002, AAMAS '02.

[8] Noel A Cressie,et al. Statistics for Spatio-Temporal Data , 2011 .

[9] Manuela M. Veloso,et al. Rational and Convergent Learning in Stochastic Games , 2001, IJCAI.

[10] Kian Hsiang Low,et al. Information-Theoretic Approach to Efficient Adaptive Path Planning for Mobile Robotic Environmental Sensing , 2009, ICAPS.

[11] Kian Hsiang Low,et al. Interactive POMDP Lite: Towards Practical Planning to Predict and Exploit Intentions for Interacting with Self-Interested Agents , 2013, IJCAI.

[12] Michael O. Duff,et al. Design for an Optimal Probe , 2003, ICML.

[13] Reinaldo A. C. Bianchi,et al. Heuristic Selection of Actions in Multiagent Reinforcement Learning , 2007, IJCAI.

[14] Kian Hsiang Low,et al. Active Markov information-theoretic path planning for robotic environmental sensing , 2011, AAMAS.

[15] Kian Hsiang Low,et al. Intention-aware planning under uncertainty for interacting with self-interested, boundedly rational agents , 2012, AAMAS.

[16] Jesse Hoey,et al. An analytic solution to discrete Bayesian reinforcement learning , 2006, ICML.

[17] Michael L. Littman,et al. Friend-or-Foe Q-learning in General-Sum Games , 2001, ICML.

[18] Gerald Tesauro,et al. Extending Q-Learning to General Adaptive Multi-Agent Systems , 2003, NIPS.

[19] Kian Hsiang Low,et al. Adaptive multi-robot wide-area exploration and mapping , 2008, AAMAS.

[20] Michael L. Littman,et al. Learning is planning: near Bayes-optimal reinforcement learning via Monte-Carlo tree search , 2011, UAI.

[21] Olivier Buffet,et al. Near-Optimal BRL using Optimistic Local Transitions , 2012, ICML.

[22] Gaurav S. Sukhatme,et al. Decentralized Data Fusion and Active Sensing with Mobile Sensors for Modeling and Predicting Spatiotemporal Traffic Phenomena , 2012, UAI.

[23] P. G. Gipps,et al. A behavioural car-following model for computer simulation , 1981 .

[24] Mohan S. Kankanhalli,et al. Decision-theoretic approach to maximizing observation of multiple targets in multi-camera surveillance , 2012, AAMAS.

[25] Natalia Akchurina,et al. Multiagent reinforcement learning: algorithm converging to Nash equilibrium in general-sum discounted stochastic games , 2009, AAMAS.

[26] Yoav Shoham,et al. Learning against opponents with bounded memory , 2005, IJCAI.

[27] Craig Boutilier,et al. Coordination in multiagent reinforcement learning: a Bayesian approach , 2003, AAMAS '03.