论文信息 - Decision-Theoretic Planning Under Anonymity in Agent Populations - 字舞流文

Decision-Theoretic Planning Under Anonymity in Agent Populations

We study the problem of self-interested planning under uncertainty in settings shared with more than a thousand other agents, each of which plans at its own individual level. We refer to such large numbers of agents as an agent population. The decision-theoretic formalism of interactive partially observable Markov decision process (I-POMDP) is used to model the agent's self-interested planning. The first contribution of this article is a method for drastically scaling the finitely-nested I-POMDP to certain agent populations for the first time. Our method exploits two types of structure that is often exhibited by agent populations -- anonymity and context-specific independence. We present a variant called the many-agent I-POMDP that models both these types of structure to plan efficiently under uncertainty in multiagent settings. In particular, the complexity of the belief update and solution in the many-agent I-POMDP is polynomial in the number of agents compared with the exponential growth that challenges the original framework. While exploiting structure helps mitigate the curse of many agents, the well-known curse of history that afflicts I-POMDPs continues to challenge scalability in terms of the planning horizon. The second contribution of this article is an application of the branch-and-bound scheme to reduce the exponential growth of the search tree for look ahead. For this, we introduce new fast-computing upper and lower bounds for the exact value function of the many-agent I-POMDP. This speeds up the look-ahead computations without trading off optimality, and reduces both memory and run time complexity. The third contribution is a comprehensive empirical evaluation of the methods on three new problems domains -- policing large protests, controlling traffic congestion at a busy intersection, and improving the AI for the popular Clash of Clans multiplayer game. We demonstrate the feasibility of exact self-interested planning in these large problems, and that our methods for speeding up the planning are effective. Altogether, these contributions represent a principled and significant advance toward moving self-interested planning under uncertainty to real-world applications.

Yingke Chen | Prashant Doshi | Ekhlas Sonu | Ekhlas Sonu | Yingke Chen | Prashant Doshi

[1] Shih-Fen Cheng,et al. Decision Support for Agent Populations in Uncertain and Congested Environments , 2012, AAAI.

[2] Carlos Guestrin,et al. Multiagent Planning with Factored MDPs , 2001, NIPS.

[3] Prashant Doshi,et al. Modeling recursive reasoning by humans using empirically informed interactive POMDPs , 2010, AAMAS.

[4] Daphne Koller,et al. Multi-Agent Influence Diagrams for Representing and Solving Games , 2001, IJCAI.

[5] A. Land,et al. An Automatic Method for Solving Discrete Programming Problems , 1960, 50 Years of Integer Programming.

[6] Milos Hauskrecht,et al. Planning and control in stochastic domains with imperfect information , 1997 .

[7] Michael L. Littman,et al. Using iterated reasoning to predict opponent strategies , 2011, AAMAS.

[8] Kevin Leyton-Brown,et al. Bayesian Action-Graph Games , 2010, NIPS.

[9] Reid G. Simmons,et al. Heuristic Search Value Iteration for POMDPs , 2004, UAI.

[10] Victor R. Lesser,et al. Offline Planning for Communication by Exploiting Structured Interactions in Decentralized MDPs , 2009, 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology.

[11] Prashant Doshi,et al. Scalable solutions of interactive POMDPs using generalized and bounded policy iteration , 2014, Autonomous Agents and Multi-Agent Systems.

[12] Robert J. Wood,et al. Learning from Humans as an I-POMDP , 2012, ArXiv.

[13] Kevin Leyton-Brown,et al. Temporal Action-Graph Games: A New Representation for Dynamic Games , 2009, UAI.

[14] Leslie Pack Kaelbling,et al. Learning in embedded systems , 1993 .

[15] Makoto Yokoo,et al. Networked Distributed POMDPs: A Synergy of Distributed Constraint Optimization and POMDPs , 2005, IJCAI.

[16] Milos Hauskrecht,et al. Value-Function Approximations for Partially Observable Markov Decision Processes , 2000, J. Artif. Intell. Res..

[17] Gilbert L. Peterson,et al. A Trust-Based Multiagent System , 2009, 2009 International Conference on Computational Science and Engineering.

[18] Fangju Wang,et al. An I-POMDP Based Multi-Agent Architecture for Dialogue Tutoring , 2013 .

[19] David Hsu,et al. SARSOP: Efficient Point-Based POMDP Planning by Approximating Optimally Reachable Belief Spaces , 2008, Robotics: Science and Systems.

[20] Patrick Jaillet,et al. Decentralized Stochastic Planning with Anonymity in Interactions , 2014, AAAI.

[21] Leslie Pack Kaelbling,et al. Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[22] Shlomo Zilberstein,et al. Formal models and algorithms for decentralized decision making under uncertainty , 2008, Autonomous Agents and Multi-Agent Systems.

[23] Sean Luke,et al. MASON : A Multi-Agent Simulation Environment , 2008 .

[24] Craig Boutilier,et al. Bounded Finite State Controllers , 2003, NIPS.

[25] P. J. Gmytrasiewicz,et al. A Framework for Sequential Planning in Multi-Agent Settings , 2005, AI&M.

[26] Mykel J. Kochenderfer,et al. Exploiting Anonymity in Approximate Linear Programming: Scaling to Large Multiagent MDPs , 2015, AAAI.

[27] Guy Shani,et al. Forward Search Value Iteration for POMDPs , 2007, IJCAI.

[28] Prashant Doshi,et al. Aerial Robotic Simulations for Evaluation of Multi-Agent Planning in GaTAC , 2015, AAMAS.

[29] Prashant Doshi,et al. Decision Making in Complex Multiagent Contexts: A Tale of Two Frameworks , 2012, AI Mag..

[30] Michael L. Littman,et al. Graphical Models for Game Theory , 2001, UAI.

[31] Peter Dayan,et al. Monte Carlo Planning Method Estimates Planning Horizons during Interactive Social Exchange , 2015, PLoS Comput. Biol..

[32] Yingke Chen,et al. Team behavior in interactive dynamic influence diagrams with applications to ad hoc teams , 2014, AAMAS.

[33] E. J. Sondik,et al. The Optimal Control of Partially Observable Markov Decision Processes. , 1971 .

[34] Milind Tambe,et al. Exploiting Coordination Locales in Distributed POMDPs via Social Model Shaping , 2009, ICAPS.

[35] John J. Nitao,et al. Towards Applying Interactive POMDPs to Real-World Adversary Modeling , 2010, IAAI.

[36] Edmund H. Durfee,et al. Influence-Based Policy Abstraction for Weakly-Coupled Dec-POMDPs , 2010, ICAPS.

[37] Craig Boutilier,et al. Context-Specific Independence in Bayesian Networks , 1996, UAI.

[38] Tim Roughgarden,et al. How bad is selfish routing? , 2002, JACM.