Approximating behavioral equivalence for scaling solutions of I-DIDs

Interactive dynamic influence diagram (I-DID) is a recognized graphical framework for sequential multiagent decision making under uncertainty. I-DIDs concisely represent the problem of how an individual agent should act in an uncertain environment shared with others of unknown types. I-DIDs face the challenge of solving a large number of models that are ascribed to other agents. A known method for solving I-DIDs is to group models of other agents that are behaviorally equivalent. Identifying model equivalence requires solving models and comparing their solutions generally represented as policy trees. Because the trees grow exponentially with the number of decision time steps, comparing entire policy trees becomes intractable, thereby limiting the scalability of previous I-DID techniques. In this article, our specific approaches focus on utilizing partial policy trees for comparison and determining the distance between updated beliefs at the leaves of the trees. We propose a principled way to determine how much of the policy trees to consider, which trades off solution quality for efficiency. We further improve on this technique by allowing the partial policy trees to have paths of differing lengths. We evaluate these approaches in multiple problem domains and demonstrate significantly improved scalability over previous approaches.

[1]  Yingke Chen,et al.  Approximating behavioral equivalence of models using top-k policy paths , 2011, AAMAS.

[2]  Yifeng Zeng,et al.  Epsilon-Subjective Equivalence of Models for Interactive Dynamic Influence Diagrams , 2010, 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.

[3]  Benjamin Lipstein A Mathematical Model of Consumer Behavior , 1965 .

[4]  Prashant Doshi,et al.  GaTAC: a scalable and realistic testbed for multiagent decision making (demonstration) , 2012, AAMAS.

[5]  Ya'akov Gal,et al.  Networks of Influence Diagrams: A Formalism for Representing Agents' Beliefs and Decision-Making Processes , 2008, J. Artif. Intell. Res..

[6]  Robert J. Aumann,et al.  Interactive epistemology I: Knowledge , 1999, Int. J. Game Theory.

[7]  Steffen L. Lauritzen,et al.  Representing and Solving Decision Problems with Limited Information , 2001, Manag. Sci..

[8]  Joelle Pineau,et al.  Anytime Point-Based Approximations for Large POMDPs , 2006, J. Artif. Intell. Res..

[9]  Kristian G. Olesen,et al.  HUGIN - A Shell for Building Bayesian Belief Universes for Expert Systems , 1989, IJCAI.

[10]  Neil Immerman,et al.  The Complexity of Decentralized Control of Markov Decision Processes , 2000, UAI.

[11]  Ross D. Shachter,et al.  Dynamic programming and influence diagrams , 1990, IEEE Trans. Syst. Man Cybern..

[12]  Yifeng Zeng,et al.  Approximate Solutions of Interactive Dynamic Influence Diagrams Using Model Clustering , 2007, AAAI.

[13]  Yifeng Zeng,et al.  Approximate solutions of interactive dynamic influence diagrams using ε-behavioral equivalence , 2010, ISAIM.

[14]  Bo Li,et al.  Path planning for automated guided vehicles system via interactive dynamic influence diagrams with communication , 2011, 2011 9th IEEE International Conference on Control and Automation (ICCA).

[15]  Daphne Koller,et al.  Multi-Agent Influence Diagrams for Representing and Solving Games , 2001, IJCAI.

[16]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[17]  Leslie Pack Kaelbling,et al.  Influence-Based Abstraction for Multiagent Systems , 2012, AAAI.

[18]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[19]  Ya'akov Gal,et al.  A language for modeling agents' decision making processes in games , 2003, AAMAS '03.

[20]  E. J. Sondik,et al.  The Optimal Control of Partially Observable Markov Decision Processes. , 1971 .

[21]  Prashant Doshi,et al.  Exact solutions of interactive POMDPs using behavioral equivalence , 2006, AAMAS '06.

[22]  Yingke Chen,et al.  Iterative Online Planning in Multiagent Settings with Limited Model Spaces and PAC Guarantees , 2015, AAMAS.

[23]  Yifeng Zeng,et al.  Improved approximation of interactive dynamic influence diagrams using discriminative model updates , 2009, AAMAS.

[24]  Shlomo Zilberstein,et al.  Formal models and algorithms for decentralized decision making under uncertainty , 2008, Autonomous Agents and Multi-Agent Systems.

[25]  Xavier Boyen,et al.  Tractable Inference for Complex Stochastic Processes , 1998, UAI.

[26]  Yingke Chen,et al.  Team behavior in interactive dynamic influence diagrams with applications to ad hoc teams , 2014, AAMAS.

[27]  John J. Nitao,et al.  Towards Applying Interactive POMDPs to Real-World Adversary Modeling , 2010, IAAI.

[28]  Jian Luo,et al.  Improved use of partial policies for identifying behavioral equivalence , 2012, AAMAS.

[29]  Marc Cavazza,et al.  Learning Behaviors in Agents Systems with Interactive Dynamic Influence Diagrams , 2015, IJCAI.

[30]  Edmund H. Durfee,et al.  Influence-Based Policy Abstraction for Weakly-Coupled Dec-POMDPs , 2010, ICAPS.

[31]  A. Chattopadhyay Discrepancy and the Power of Bottom Fan-in in Depth-three Circuits , 2007, FOCS 2007.

[32]  Ronald A. Howard,et al.  Readings on the Principles and Applications of Decision Analysis , 1989 .

[33]  Jian Luo,et al.  Utilizing Partial Policies for Identifying Equivalence of Behavioral Models , 2011, AAAI.

[34]  A. R. Perry,et al.  The FlightGear flight simulator , 2004 .

[35]  Jaakko Peltonen,et al.  Efficient Planning for Factored Infinite-Horizon DEC-POMDPs , 2011, IJCAI.

[36]  C. Papadimitriou,et al.  Computing Equilibria in Anonymous Games , 2007, FOCS 2007.

[37]  Makoto Yokoo,et al.  Taming Decentralized POMDPs: Towards Efficient Policy Computation for Multiagent Settings , 2003, IJCAI.

[38]  Edward J. Sondik,et al.  The Optimal Control of Partially Observable Markov Processes over a Finite Horizon , 1973, Oper. Res..

[39]  Ross D. Shachter Evaluating Influence Diagrams , 1986, Oper. Res..

[40]  Yifeng Zeng,et al.  Exploiting Model Equivalences for Solving Interactive Dynamic Influence Diagrams , 2014, J. Artif. Intell. Res..

[41]  P. J. Gmytrasiewicz,et al.  A Framework for Sequential Planning in Multi-Agent Settings , 2005, AI&M.

[42]  Weiru Liu,et al.  Incorporating PGMs into a BDI Architecture , 2013, PRIMA.

[43]  Yifeng Zeng,et al.  Graphical models for interactive POMDPs: representations and solutions , 2009, Autonomous Agents and Multi-Agent Systems.

[44]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[45]  Eddie Dekel,et al.  Hierarchies of Beliefs and Common Knowledge , 1993 .

[46]  Shimon Whiteson,et al.  Exploiting locality of interaction in factored Dec-POMDPs , 2008, AAMAS.

[47]  Yifeng Zeng,et al.  Speeding Up Exact Solutions of Interactive Dynamic Influence Diagrams Using Action Equivalence , 2009, IJCAI.

[48]  Stephen Morris,et al.  Topologies on Types , 2005 .

[49]  Shimon Whiteson,et al.  Approximate solutions for factored Dec-POMDPs with many agents , 2013, AAMAS.

[50]  Stacy Marsella,et al.  Minimal Mental Models , 2007, AAAI.

[51]  Kristian G. Olesen,et al.  HUGIN - a Shell for Building Belief Universes for Expert Systems , 1989, IJCAI 1989.

[52]  Ronald A. Howard,et al.  Influence Diagrams , 2005, Decis. Anal..