Decision-Theoretic Planning Under Anonymity in Agent Populations

We study the problem of self-interested planning under uncertainty in settings shared with more than a thousand other agents, each of which plans at its own individual level. We refer to such large numbers of agents as an agent population. The decision-theoretic formalism of interactive partially observable Markov decision process (I-POMDP) is used to model the agent's self-interested planning. The first contribution of this article is a method for drastically scaling the finitely-nested I-POMDP to certain agent populations for the first time. Our method exploits two types of structure that is often exhibited by agent populations -- anonymity and context-specific independence. We present a variant called the many-agent I-POMDP that models both these types of structure to plan efficiently under uncertainty in multiagent settings. In particular, the complexity of the belief update and solution in the many-agent I-POMDP is polynomial in the number of agents compared with the exponential growth that challenges the original framework. While exploiting structure helps mitigate the curse of many agents, the well-known curse of history that afflicts I-POMDPs continues to challenge scalability in terms of the planning horizon. The second contribution of this article is an application of the branch-and-bound scheme to reduce the exponential growth of the search tree for look ahead. For this, we introduce new fast-computing upper and lower bounds for the exact value function of the many-agent I-POMDP. This speeds up the look-ahead computations without trading off optimality, and reduces both memory and run time complexity. The third contribution is a comprehensive empirical evaluation of the methods on three new problems domains -- policing large protests, controlling traffic congestion at a busy intersection, and improving the AI for the popular Clash of Clans multiplayer game. We demonstrate the feasibility of exact self-interested planning in these large problems, and that our methods for speeding up the planning are effective. Altogether, these contributions represent a principled and significant advance toward moving self-interested planning under uncertainty to real-world applications.

[1]  Shih-Fen Cheng,et al.  Decision Support for Agent Populations in Uncertain and Congested Environments , 2012, AAAI.

[2]  Carlos Guestrin,et al.  Multiagent Planning with Factored MDPs , 2001, NIPS.

[3]  Prashant Doshi,et al.  Modeling recursive reasoning by humans using empirically informed interactive POMDPs , 2010, AAMAS.

[4]  Daphne Koller,et al.  Multi-Agent Influence Diagrams for Representing and Solving Games , 2001, IJCAI.

[5]  A. Land,et al.  An Automatic Method for Solving Discrete Programming Problems , 1960, 50 Years of Integer Programming.

[6]  Milos Hauskrecht,et al.  Planning and control in stochastic domains with imperfect information , 1997 .

[7]  Michael L. Littman,et al.  Using iterated reasoning to predict opponent strategies , 2011, AAMAS.

[8]  Kevin Leyton-Brown,et al.  Bayesian Action-Graph Games , 2010, NIPS.

[9]  Reid G. Simmons,et al.  Heuristic Search Value Iteration for POMDPs , 2004, UAI.

[10]  Victor R. Lesser,et al.  Offline Planning for Communication by Exploiting Structured Interactions in Decentralized MDPs , 2009, 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology.

[11]  Prashant Doshi,et al.  Scalable solutions of interactive POMDPs using generalized and bounded policy iteration , 2014, Autonomous Agents and Multi-Agent Systems.

[12]  Robert J. Wood,et al.  Learning from Humans as an I-POMDP , 2012, ArXiv.

[13]  Kevin Leyton-Brown,et al.  Temporal Action-Graph Games: A New Representation for Dynamic Games , 2009, UAI.

[14]  Leslie Pack Kaelbling,et al.  Learning in embedded systems , 1993 .

[15]  Makoto Yokoo,et al.  Networked Distributed POMDPs: A Synergy of Distributed Constraint Optimization and POMDPs , 2005, IJCAI.

[16]  Milos Hauskrecht,et al.  Value-Function Approximations for Partially Observable Markov Decision Processes , 2000, J. Artif. Intell. Res..

[17]  Gilbert L. Peterson,et al.  A Trust-Based Multiagent System , 2009, 2009 International Conference on Computational Science and Engineering.

[18]  Fangju Wang,et al.  An I-POMDP Based Multi-Agent Architecture for Dialogue Tutoring , 2013 .

[19]  David Hsu,et al.  SARSOP: Efficient Point-Based POMDP Planning by Approximating Optimally Reachable Belief Spaces , 2008, Robotics: Science and Systems.

[20]  Patrick Jaillet,et al.  Decentralized Stochastic Planning with Anonymity in Interactions , 2014, AAAI.

[21]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[22]  Shlomo Zilberstein,et al.  Formal models and algorithms for decentralized decision making under uncertainty , 2008, Autonomous Agents and Multi-Agent Systems.

[23]  Sean Luke,et al.  MASON : A Multi-Agent Simulation Environment , 2008 .

[24]  Craig Boutilier,et al.  Bounded Finite State Controllers , 2003, NIPS.

[25]  P. J. Gmytrasiewicz,et al.  A Framework for Sequential Planning in Multi-Agent Settings , 2005, AI&M.

[26]  Mykel J. Kochenderfer,et al.  Exploiting Anonymity in Approximate Linear Programming: Scaling to Large Multiagent MDPs , 2015, AAAI.

[27]  Guy Shani,et al.  Forward Search Value Iteration for POMDPs , 2007, IJCAI.

[28]  Prashant Doshi,et al.  Aerial Robotic Simulations for Evaluation of Multi-Agent Planning in GaTAC , 2015, AAMAS.

[29]  Prashant Doshi,et al.  Decision Making in Complex Multiagent Contexts: A Tale of Two Frameworks , 2012, AI Mag..

[30]  Michael L. Littman,et al.  Graphical Models for Game Theory , 2001, UAI.

[31]  Peter Dayan,et al.  Monte Carlo Planning Method Estimates Planning Horizons during Interactive Social Exchange , 2015, PLoS Comput. Biol..

[32]  Yingke Chen,et al.  Team behavior in interactive dynamic influence diagrams with applications to ad hoc teams , 2014, AAMAS.

[33]  E. J. Sondik,et al.  The Optimal Control of Partially Observable Markov Decision Processes. , 1971 .

[34]  Milind Tambe,et al.  Exploiting Coordination Locales in Distributed POMDPs via Social Model Shaping , 2009, ICAPS.

[35]  John J. Nitao,et al.  Towards Applying Interactive POMDPs to Real-World Adversary Modeling , 2010, IAAI.

[36]  Edmund H. Durfee,et al.  Influence-Based Policy Abstraction for Weakly-Coupled Dec-POMDPs , 2010, ICAPS.

[37]  Craig Boutilier,et al.  Context-Specific Independence in Bayesian Networks , 1996, UAI.

[38]  Tim Roughgarden,et al.  How bad is selfish routing? , 2002, JACM.