Planning for large-scale multiagent problems via hierarchical decomposition with applications to UAV health management

This paper introduces a novel hierarchical decomposition approach for solving Multiagent Markov Decision Processes (MMDPs) by exploiting coupling relationships in the reward function. MMDP is a natural framework for solving stochastic multi-stage multiagent decision-making problems, such as optimizing mission performance of Unmanned Aerial Vehicles (UAVs) with stochastic health dynamics. However, computing the optimal solutions is often intractable because the state-action spaces scale exponentially with the number of agents. Approximate solution techniques do exist, but they typically rely on extensive domain knowledge. This paper presents the Hierarchically Decomposed MMDP (HD-MMDP) algorithm, which autonomously identifies different degrees of coupling in the reward function and decomposes the MMDP into a hierarchy of smaller MDPs that can be solved separately. Solutions to the smaller MDPs are embedded in an autonomously constructed tree structure to generate an approximate solution to the original problem. Simulation results show HD-MMDP obtains more cumulative reward than that of the existing algorithm for a ten-agent Persistent Search and Track (PST) mission, which is a cooperative multi-UAV mission with more than 1019 states, stochastic fuel consumption model, and health progression model.

[1]  Carlos Guestrin,et al.  Multiagent Planning with Factored MDPs , 2001, NIPS.

[2]  Joshua D. Redding,et al.  Approximate multi-agent planning in dynamic and uncertain environments , 2011 .

[3]  Kagan Tumer,et al.  Graphical models in continuous domains for multiagent reinforcement learning , 2013, AAMAS.

[4]  Richard Bellman,et al.  Dynamic Programming and Stochastic Control Processes , 1958, Inf. Control..

[5]  Michael L. Littman,et al.  Packet Routing in Dynamically Changing Networks: A Reinforcement Learning Approach , 1993, NIPS.

[6]  Claudia V. Goldman,et al.  Solving Transition Independent Decentralized Markov Decision Processes , 2004, J. Artif. Intell. Res..

[7]  Jonathan P. How,et al.  Experimental Demonstration of Multi-Agent Learning and Planning under Uncertainty for Persistent Missions with Automated Battery Management , 2012 .

[8]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[9]  Milos Hauskrecht,et al.  Hierarchical Solution of Markov Decision Processes using Macro-actions , 1998, UAI.

[10]  Nikos A. Vlassis,et al.  Collaborative Multiagent Reinforcement Learning by Payoff Propagation , 2006, J. Mach. Learn. Res..

[11]  Craig Boutilier,et al.  Sequential Optimality and Coordination in Multiagent Systems , 1999, IJCAI.

[12]  Luke B. Johnson,et al.  Multiagent allocation of Markov decision process tasks , 2013, 2013 American Control Conference.

[13]  Yann Chevaleyre,et al.  Issues in Multiagent Resource Allocation , 2006, Informatica.

[14]  Ronald Parr,et al.  Greedy Algorithms for Sparse Reinforcement Learning , 2012, ICML.

[15]  Ronald C. Arkin,et al.  Multiagent Mission Specification and Execution , 1997, Auton. Robots.

[16]  Michail G. Lagoudakis,et al.  Coordinated Reinforcement Learning , 2002, ICML.

[17]  Alborz Geramifard,et al.  Practical reinforcement learning using representation learning and safe exploration for large scale Markov decision processes , 2012 .

[18]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[19]  Koby Crammer,et al.  Learning from Multiple Sources , 2006, NIPS.

[20]  Bart De Schutter,et al.  Reinforcement Learning and Dynamic Programming Using Function Approximators , 2010 .

[21]  R. Bellman Dynamic programming. , 1957, Science.