Exploiting structure in coordinating multiple decision makers

This thesis is concerned with sequential decision making by multiple agents, whether they are acting cooperatively to maximize team reward or selfishly trying to maximize their individual rewards. The practical intractability of this general problem led to efforts in identifying special cases that admit efficient computation, yet still represent a wide enough range of problems. In our work, we identify the class of problems with structured interactions, where actions of one agent can have non-local effects on the transitions and/or rewards of another agent. We addressed the following research questions: (1) How can we compactly represent this class of problems? (2) How can we efficiently calculate agent policies that maximize team reward (for cooperative agents) or achieve equilibrium (self-interested agents)? (3) How can we exploit structured interactions to make reasoning about communication offline tractable? For representing our class of problems, we developed a new decision-theoretic model, Event-Driven Interactions with Complex Rewards (EDI-CR), that explicitly represents structured interactions. EDI-CR is a compact yet general representation capable of capturing problems where the degree of coupling among agents ranges from complete independence to complete dependence. For calculating agent policies, we draw on several techniques from the field of mathematical optimization and adapt them to exploit the special structure in EDI-CR. We developed a Mixed Integer Linear Program formulation of EDI-CR with cooperative agents that results in programs much more compact and faster to solve than formulations ignoring structure. We also investigated the use of homotopy methods as an optimization technique, as well as formulation of self-interested EDI-CR as a system of non-linear equations. We looked at the issue of communication in both cooperative and self-interested settings. For the cooperative setting, we developed heuristics that assess the impact of potential communication points and add the ones with highest impact to the agents’ decision problems. Our heuristics successfully pick communication points that improve team reward while keeping problem size manageable. Also, by controlling the amount of communication introduced by a heuristic, our approach allows us to control the tradeoff between solution quality and problem size. For self-interested agents, we look at an example setting where communication is an integral part of problem solving, but where the self-interested agents have a reason to be reticent (e.g. privacy concerns). We formulate this problem as a game of incomplete information and present a general algorithm for calculating approximate equilibrium profile in this class of games.

[1]  Alain Dutech,et al.  An Investigation into Mathematical Programming for Finite Horizon Decentralized POMDPs , 2014, J. Artif. Intell. Res..

[2]  Tuomas Sandholm,et al.  Finding equilibria in large sequential games of imperfect information , 2006, EC '06.

[3]  D. Agrawal,et al.  View Invalidation for Dynamic Content Caching in Multitiered Architectures , 2002, Very Large Data Bases Conference.

[4]  Victor Lesser,et al.  Exploiting structure in decentralized markov decision processes , 2006 .

[5]  Victor R. Lesser,et al.  Minimizing communication cost in a distributed Bayesian network using a decentralized MDP , 2003, AAMAS '03.

[6]  P. Jean-Jacques Herings,et al.  Computation of the Nash Equilibrium Selected by the Tracing Procedure in N-Person Games , 2002, Games Econ. Behav..

[7]  Michael L. Littman,et al.  Graphical Models for Game Theory , 2001, UAI.

[8]  Kevin Leyton-Brown,et al.  Computing Nash Equilibria of Action-Graph Games , 2004, UAI.

[9]  Kevin Leyton-Brown,et al.  Temporal Action-Graph Games: A New Representation for Dynamic Games , 2009, UAI.

[10]  Victor R. Lesser,et al.  Multi-agent policies: from centralized ones to decentralized ones , 2002, AAMAS '02.

[11]  Bruce M. Maggs,et al.  Invalidation Clues for Database Scalability Services , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[12]  Victor R. Lesser,et al.  Agent interaction in distributed POMDPs and its implications on complexity , 2006, AAMAS '06.

[13]  Eitan Altman,et al.  Zero-sum constrained stochastic games with independent state processes , 2005, Math. Methods Oper. Res..

[14]  Shlomo Zilberstein,et al.  Dynamic Programming for Partially Observable Stochastic Games , 2004, AAAI.

[15]  Shimon Whiteson,et al.  Lossless clustering of histories in decentralized POMDPs , 2009, AAMAS.

[16]  Masha Sosonkina,et al.  Algorithm 777: HOMPACK90: a suite of Fortran 90 codes for globally convergent homotopy algorithms , 1997, TOMS.

[17]  Claudia V. Goldman,et al.  Optimizing information exchange in cooperative multi-agent systems , 2003, AAMAS '03.

[18]  Victor R. Lesser,et al.  Self-interested database managers playing the view maintenance game , 2008, AAMAS.

[19]  R. McKelvey,et al.  Computation of equilibria in finite games , 1996 .

[20]  Shlomo Zilberstein,et al.  Formal models and algorithms for decentralized decision making under uncertainty , 2008, Autonomous Agents and Multi-Agent Systems.

[21]  Manuela M. Veloso,et al.  Reasoning about joint beliefs for execution-time communication decisions , 2005, AAMAS '05.

[22]  Robert Wilson,et al.  A global Newton method to compute Nash equilibria , 2003, J. Econ. Theory.

[23]  Victor R. Lesser,et al.  Compact Mathematical Programs For DEC-MDPs With Structured Agent Interactions , 2011, UAI.

[24]  Daphne Koller,et al.  Multi-Agent Influence Diagrams for Representing and Solving Games , 2001, IJCAI.

[25]  Victor R. Lesser,et al.  Decentralized Markov decision processes with event-driven interactions , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[26]  Daphne Koller,et al.  Multi-agent algorithms for solving graphical games , 2002, AAAI/IAAI.

[27]  Edmund H. Durfee,et al.  Commitment-driven distributed joint policy search , 2007, AAMAS '07.

[28]  Pierfrancesco La Mura Game Networks , 2000, UAI.

[29]  Daphne Koller,et al.  A Continuation Method for Nash Equilibria in Structured Games , 2003, IJCAI.

[30]  Vincent Conitzer,et al.  Complexity Results about Nash Equilibria , 2002, IJCAI.

[31]  Qiong Luo,et al.  Template-Based Runtime Invalidation for Database-Generated Web Contents , 2004, APWeb.

[32]  Ronald A. Howard,et al.  Influence Diagrams , 2005, Decis. Anal..

[33]  Layne T. Watson,et al.  Probability-one homotopy maps for mixed complementarity problems , 2008, Comput. Optim. Appl..

[34]  Francisco S. Melo,et al.  Interaction-driven Markov games for decentralized multiagent planning under uncertainty , 2008, AAMAS.

[35]  Makoto Yokoo,et al.  Communications for improving policy computation in distributed POMDPs , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[36]  H. Kuk On equilibrium points in bimatrix games , 1996 .

[37]  Miroslav Dudík,et al.  A Sampling-Based Approach to Computing Equilibria in Succinct Extensive-Form Games , 2009, UAI.

[38]  R. McKelvey,et al.  Quantal Response Equilibria for Extensive Form Games , 1998 .

[39]  Daniel L. Silver,et al.  A distributed multi-agent meeting scheduler , 2008, J. Comput. Syst. Sci..

[40]  B. Stengel,et al.  Efficient Computation of Behavior Strategies , 1996 .

[41]  Y. Saad,et al.  GMRES: a generalized minimal residual algorithm for solving nonsymmetric linear systems , 1986 .

[42]  Shlomo Zilberstein,et al.  Memory-Bounded Dynamic Programming for DEC-POMDPs , 2007, IJCAI.

[43]  Ronald A. Howard,et al.  Readings on the Principles and Applications of Decision Analysis , 1989 .

[44]  Shlomo Zilberstein,et al.  Improved Memory-Bounded Dynamic Programming for Decentralized POMDPs , 2007, UAI.

[45]  Nikos A. Vlassis,et al.  Decentralized planning under uncertainty for teams of communicating agents , 2006, AAMAS '06.

[46]  Leslie Pack Kaelbling,et al.  Multi-Agent Filtering with Infinitely Nested Beliefs , 2008, NIPS.

[47]  François Charpillet,et al.  Point-based Dynamic Programming for DEC-POMDPs , 2006, AAAI.

[48]  Layne T. Watson,et al.  Theory of Globally Convergent Probability-One Homotopies for Nonlinear Programming , 2000, SIAM J. Optim..

[49]  Claudia V. Goldman,et al.  Decentralized Control of Cooperative Systems: Categorization and Complexity Analysis , 2004, J. Artif. Intell. Res..

[50]  Victor R. Lesser,et al.  Lateral and Hierarchical Partial Centralization for Distributed Coordination and Scheduling of Complex Hierarchical Task Networks , 2006, AAAI Spring Symposium: Distributed Plan and Schedule Management.

[51]  Ennio Stacchetti,et al.  A Bound on the Proportion of Pure Strategy Equilibria in Generic Games , 1993, Math. Oper. Res..

[52]  John C. Harsanyi,et al.  Games with Incomplete Information Played by "Bayesian" Players, I-III: Part I. The Basic Model& , 2004, Manag. Sci..

[53]  Ramachandra Kota,et al.  Cooperatives of distributed energy resources for efficient virtual power plants , 2011, AAMAS.

[54]  Ulrich Doraszelski,et al.  A User's Guide to Solving Dynamic Stochastic Games Using the Homotopy Method , 2008, Oper. Res..

[55]  Marek Petrik,et al.  Interaction Structure and Dimensionality Reduction in Decentralized MDPs , 2008, AAAI.

[56]  Victor R. Lesser,et al.  Offline Planning for Communication by Exploiting Structured Interactions in Decentralized MDPs , 2009, 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology.

[57]  Christodoulos A. Floudas,et al.  Finding all solutions of nonlinearly constrained systems of equations , 1995, J. Glob. Optim..

[58]  Marek Petrik,et al.  Anytime Coordination Using Separable Bilinear Programs , 2007, AAAI.

[59]  V. S. Subrahmanian,et al.  Maintaining views incrementally , 1993, SIGMOD Conference.

[60]  Edmund H. Durfee,et al.  Influence-Based Policy Abstraction for Weakly-Coupled Dec-POMDPs , 2010, ICAPS.

[61]  J. Harsanyi Games with Incomplete Information Played by 'Bayesian' Players, Part III. The Basic Probability Distribution of the Game , 1968 .

[62]  Satinder Singh,et al.  An Efficient Exact Algorithm for Singly Connected Graphical Games , 2002, NIPS 2002.

[63]  J. Yorke,et al.  Finding zeroes of maps: homotopy methods that are constructive with probability one , 1978 .

[64]  Nikos A. Vlassis,et al.  Multiagent Planning Under Uncertainty with Stochastic Communication Delays , 2008, ICAPS.

[65]  A.J.J. Talman,et al.  Tracing equilibria in extensive games by complementary pivoting , 1996 .

[66]  Shlomo Zilberstein,et al.  Value-based observation compression for DEC-POMDPs , 2008, AAMAS.

[67]  D. Koller,et al.  Efficient Computation of Equilibria for Extensive Two-Person Games , 1996 .

[68]  Martin Allen,et al.  Agent interactions in decentralized environments , 2009 .

[69]  Theodore L. Turocy A dynamic homotopy interpretation of the logistic quantal response equilibrium correspondence , 2005, Games Econ. Behav..

[70]  Makoto Yokoo,et al.  Taming Decentralized POMDPs: Towards Efficient Policy Computation for Multiagent Settings , 2003, IJCAI.

[71]  Sonia Cafieri,et al.  Comparison of convex relaxations of quadrilinear terms , 2009 .

[72]  Victor Lesser,et al.  Exploiting Structure To Efficiently Solve Loosely Coupled Stochastic Games , 2010 .

[73]  Craig Boutilier,et al.  Planning, Learning and Coordination in Multiagent Decision Processes , 1996, TARK.

[74]  Michal Pechoucek,et al.  Proceedings of the fourth international joint conference on Autonomous agents and multiagent systems , 2005, AAMAS 2005.

[75]  John C. Harsanyi,et al.  Общая теория выбора равновесия в играх / A General Theory of Equilibrium Selection in Games , 1989 .

[76]  Neil Immerman,et al.  The Complexity of Decentralized Control of Markov Decision Processes , 2000, UAI.

[77]  Victor R. Lesser,et al.  Analyzing myopic approaches for multi-agent communication , 2005, IEEE/WIC/ACM International Conference on Intelligent Agent Technology.

[78]  Victor R. Lesser,et al.  Multi-Agent Learning with Policy Prediction , 2010, AAAI.

[79]  Claudia V. Goldman,et al.  Solving Transition Independent Decentralized Markov Decision Processes , 2004, J. Artif. Intell. Res..

[80]  Vincent Conitzer,et al.  New complexity results about Nash equilibria , 2008, Games Econ. Behav..

[81]  Makoto Yokoo,et al.  Networked Distributed POMDPs: A Synergy of Distributed Constraint Optimization and POMDPs , 2005, IJCAI.

[82]  Roger B. Myerson,et al.  Game theory - Analysis of Conflict , 1991 .

[83]  Victor R. Lesser,et al.  Incorporating Uncertainty in Agent Commitments , 1999, ATAL.

[84]  Frank Wm. Tompa,et al.  Efficiently updating materialized views , 1986, SIGMOD '86.

[85]  Edmund H. Durfee,et al.  Flexible approximation of structured interactions in decentralized Markov decision processes , 2009, AAMAS.

[86]  Jianhui Wu,et al.  Mixed-integer linear programming for transition-independent decentralized MDPs , 2006, AAMAS '06.

[87]  Abdel-Illah Mouaddib,et al.  A polynomial algorithm for decentralized Markov decision processes with temporal constraints , 2005, AAMAS '05.

[88]  Eitan Zemel,et al.  Nash and correlated equilibria: Some complexity considerations , 1989 .

[89]  Milind Tambe,et al.  Exploiting Coordination Locales in Distributed POMDPs via Social Model Shaping , 2009, ICAPS.

[90]  François Charpillet,et al.  Quadratic Programming for Multi-Target Tracking , 2009 .

[91]  Makoto Yokoo,et al.  Exploiting Locality of Interaction in Networked Distributed POMDPs , 2006, AAAI Spring Symposium: Distributed Plan and Schedule Management.

[92]  Milind Tambe,et al.  The Communicative Multiagent Team Decision Problem: Analyzing Teamwork Theories and Models , 2011, J. Artif. Intell. Res..

[93]  Andrew McLennan,et al.  Gambit: Software Tools for Game Theory , 2006 .