A General Framework for Interacting Bayes-Optimally with Self-Interested Agents using Arbitrary Parametric Model and Model Prior

Recent advances in Bayesian reinforcement learning (BRL) have shown that Bayes-optimality is theoretically achievable by modeling the environment's latent dynamics using Flat-Dirichlet-Multinomial (FDM) prior. In self-interested multi-agent environments, the transition dynamics are mainly controlled by the other agent's stochastic behavior for which FDM's independence and modeling assumptions do not hold. As a result, FDM does not allow the other agent's behavior to be generalized across different states nor specified using prior domain knowledge. To overcome these practical limitations of FDM, we propose a generalization of BRL to integrate the general class of parametric models and model priors, thus allowing practitioners' domain knowledge to be exploited to produce a fine-grained and compact representation of the other agent's behavior. Empirical evaluation shows that our approach outperforms existing multi-agent reinforcement learning algorithms.

[1]  Kian Hsiang Low,et al.  Multi-robot informative path planning for active sensing of environmental phenomena: a tale of two algorithms , 2013, AAMAS.

[2]  Stuart J. Russell,et al.  Bayesian Q-Learning , 1998, AAAI/IAAI.

[3]  Kian Hsiang Low,et al.  Adaptive Sampling for Multi-Robot Wide-Area Exploration , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[4]  Kian Hsiang Low,et al.  Decentralized active robotic exploration and mapping for probabilistic field classification in environmental sensing , 2012, AAMAS.

[5]  Joelle Pineau,et al.  Point-based value iteration: An anytime algorithm for POMDPs , 2003, IJCAI.

[6]  Mohan S. Kankanhalli,et al.  Decision-theoretic coordination and control for active multi-camera surveillance in uncertain, partially observable environments , 2012, 2012 Sixth International Conference on Distributed Smart Cameras (ICDSC).

[7]  Akira Hayashi,et al.  A multiagent reinforcement learning algorithm using extended optimal response , 2002, AAMAS '02.

[8]  Noel A Cressie,et al.  Statistics for Spatio-Temporal Data , 2011 .

[9]  Manuela M. Veloso,et al.  Rational and Convergent Learning in Stochastic Games , 2001, IJCAI.

[10]  Kian Hsiang Low,et al.  Information-Theoretic Approach to Efficient Adaptive Path Planning for Mobile Robotic Environmental Sensing , 2009, ICAPS.

[11]  Kian Hsiang Low,et al.  Interactive POMDP Lite: Towards Practical Planning to Predict and Exploit Intentions for Interacting with Self-Interested Agents , 2013, IJCAI.

[12]  Michael O. Duff,et al.  Design for an Optimal Probe , 2003, ICML.

[13]  Reinaldo A. C. Bianchi,et al.  Heuristic Selection of Actions in Multiagent Reinforcement Learning , 2007, IJCAI.

[14]  Kian Hsiang Low,et al.  Active Markov information-theoretic path planning for robotic environmental sensing , 2011, AAMAS.

[15]  Kian Hsiang Low,et al.  Intention-aware planning under uncertainty for interacting with self-interested, boundedly rational agents , 2012, AAMAS.

[16]  Jesse Hoey,et al.  An analytic solution to discrete Bayesian reinforcement learning , 2006, ICML.

[17]  Michael L. Littman,et al.  Friend-or-Foe Q-learning in General-Sum Games , 2001, ICML.

[18]  Gerald Tesauro,et al.  Extending Q-Learning to General Adaptive Multi-Agent Systems , 2003, NIPS.

[19]  Kian Hsiang Low,et al.  Adaptive multi-robot wide-area exploration and mapping , 2008, AAMAS.

[20]  Michael L. Littman,et al.  Learning is planning: near Bayes-optimal reinforcement learning via Monte-Carlo tree search , 2011, UAI.

[21]  Olivier Buffet,et al.  Near-Optimal BRL using Optimistic Local Transitions , 2012, ICML.

[22]  Gaurav S. Sukhatme,et al.  Decentralized Data Fusion and Active Sensing with Mobile Sensors for Modeling and Predicting Spatiotemporal Traffic Phenomena , 2012, UAI.

[23]  P. G. Gipps,et al.  A behavioural car-following model for computer simulation , 1981 .

[24]  Mohan S. Kankanhalli,et al.  Decision-theoretic approach to maximizing observation of multiple targets in multi-camera surveillance , 2012, AAMAS.

[25]  Natalia Akchurina,et al.  Multiagent reinforcement learning: algorithm converging to Nash equilibrium in general-sum discounted stochastic games , 2009, AAMAS.

[26]  Yoav Shoham,et al.  Learning against opponents with bounded memory , 2005, IJCAI.

[27]  Craig Boutilier,et al.  Coordination in multiagent reinforcement learning: a Bayesian approach , 2003, AAMAS '03.