论文信息 - Hierarchical reinforcement learning via dynamic subspace search for multi-agent planning

Hierarchical reinforcement learning via dynamic subspace search for multi-agent planning

We consider scenarios where a swarm of unmanned vehicles (UxVs) seek to satisfy a number of diverse, spatially distributed objectives. The UxVs strive to determine an efficient plan to service the objectives while operating in a coordinated fashion. We focus on developing autonomous high-level planning, where low-level controls are leveraged from previous work in distributed motion, target tracking, localization, and communication. We rely on the use of state and action abstractions in a Markov decision processes framework to introduce a hierarchical algorithm, Dynamic Domain Reduction for Multi-Agent Planning , that enables multi-agent planning for large multi-objective environments. Our analysis establishes the correctness of our search procedure within specific subsets of the environments, termed ‘sub-environment’ and characterizes the algorithm performance with respect to the optimal trajectories in single-agent and sequential multi-agent deployment scenarios using tools from submodularity. Simulated results show significant improvement over using a standard Monte Carlo tree search in an environment with large state and action spaces.

[1] John N. Tsitsiklis,et al. The Complexity of Markov Decision Processes , 1987, Math. Oper. Res..

[2] Illah R. Nourbakhsh,et al. Planning for Human-Robot Interaction Using Time-State Aggregated POMDPs , 2008, AAAI.

[3] Leslie Pack Kaelbling,et al. Approximate Planning in POMDPs with Macro-Actions , 2003, NIPS.

[4] Jonathan P. How,et al. Decentralized control of partially observable Markov decision processes , 2015, 52nd IEEE Conference on Decision and Control.

[5] Arkadi Nemirovski,et al. Robust Convex Optimization , 1998, Math. Oper. Res..

[6] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[7] Jorge Cortes,et al. Distributed Control of Robotic Networks: A Mathematical Approach to Motion Coordination Algorithms , 2009 .

[8] Zhengzhu Feng,et al. Dynamic Programming for POMDPs Using a Factored State Representation , 2000, AIPS.

[9] Ronald E. Parr,et al. Hierarchical control and learning for markov decision processes , 1998 .

[10] Jorge Cortés,et al. Dynamic domain reduction for multi-agent planning , 2017, 2017 International Symposium on Multi-Robot and Multi-Agent Systems (MRS).

[11] R. Bellman. Dynamic programming. , 1957, Science.

[12] Basel Alomair,et al. Submodularity in Dynamics and Control of Networked Systems , 2016 .

[13] Gaurav S. Sukhatme,et al. Data-driven robotic sampling for marine ecosystem monitoring , 2015, Int. J. Robotics Res..

[14] M. Campi,et al. The scenario approach for systems and control design , 2008 .

[15] Stuart J. Russell,et al. Markovian State and Action Abstractions for MDPs via Hierarchical MCTS , 2016, IJCAI.

[16] Jorge Cortes,et al. Coordinated Control of Multi-Robot Systems: A Survey , 2017 .

[17] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[18] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.

[19] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.

[20] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[21] Elman Mansimov,et al. Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation , 2017, NIPS.

[22] Steven M. LaValle,et al. Rapidly-Exploring Random Trees: Progress and Prospects , 2000 .