Distributed Learning for Planning Under Uncertainty Problems with Heterogeneous Teams

This paper considers the problem of multiagent sequential decision making under uncertainty and incomplete knowledge of the state transition model. A distributed learning framework, where each agent learns an individual model and shares the results with the team, is proposed. The challenges associated with this approach include choosing the model representation for each agent and how to effectively share these representations under limited communication. A decentralized extension of the model learning scheme based on the Incremental Feature Dependency Discovery (Dec-iFDD) is presented to address the distributed learning problem. The representation selection problem is solved by leveraging iFDD’s property of adjusting the model complexity based on the observed data. The model sharing problem is addressed by having each agent rank the features of their representation based on the model reduction error and broadcast the most relevant features to their teammates. The algorithm is tested on the multi-agent block building and the persistent search and track missions. The results show that the proposed distributed learning scheme is particularly useful in heterogeneous learning setting, where each agent learns significantly different models. We show through large-scale planning under uncertainty simulations and flight experiments with state-dependent actuator and fuel-burn- rate uncertainty that our planning approach can outperform planners that do not account for heterogeneity between agents.

[1]  Jörgen W. Weibull,et al.  Evolutionary Game Theory , 1996 .

[2]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[3]  Vikram Krishnamurthy,et al.  Quickest Time Herding and Detection for Optimal Social Learning , 2010, ArXiv.

[4]  Tuna Toksoz Design and implementation of an automated battery management platform , 2012 .

[5]  László Monostori,et al.  Agent-based systems for manufacturing , 2006 .

[6]  Petar M. Djuric,et al.  Distributed Bayesian learning in multiagent systems: Improving our understanding of its capabilities and limitations , 2012, IEEE Signal Processing Magazine.

[7]  Warrren B Powell Approximating Value Functions , 2007 .

[8]  Alborz Geramifard,et al.  Dyna-Style Planning with Linear Function Approximation and Prioritized Sweeping , 2008, UAI.

[9]  C. Cannings,et al.  Evolutionary Game Theory , 2010 .

[10]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[11]  Bart De Schutter,et al.  Reinforcement Learning and Dynamic Programming Using Function Approximators , 2010 .

[12]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[13]  Alborz Geramifard,et al.  Online Discovery of Feature Dependencies , 2011, ICML.

[14]  B. Bethke,et al.  Real-time indoor autonomous vehicle test environment , 2008, IEEE Control Systems.

[15]  Wolfram Burgard,et al.  Probabilistic Robotics (Intelligent Robotics and Autonomous Agents) , 2005 .

[16]  Warren B. Powell,et al.  Approximate Dynamic Programming - Solving the Curses of Dimensionality , 2007 .

[17]  Warren B. Powell,et al.  “Approximate dynamic programming: Solving the curses of dimensionality” by Warren B. Powell , 2007, Wiley Series in Probability and Statistics.

[18]  Sean Luke,et al.  Cooperative Multi-Agent Learning: The State of the Art , 2005, Autonomous Agents and Multi-Agent Systems.

[19]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[20]  Vikram Krishnamurthy,et al.  Quickest time detection and constrained optimal social learning with variance penalty , 2010, 49th IEEE Conference on Decision and Control (CDC).

[21]  Jonathan P. How,et al.  Decentralized learning-based planning for multiagent missions in the presence of actuator failures , 2013, 2013 International Conference on Unmanned Aircraft Systems (ICUAS).

[22]  Steven M. LaValle,et al.  Planning algorithms , 2006 .

[23]  Ronald Parr,et al.  Greedy Algorithms for Sparse Reinforcement Learning , 2012, ICML.

[24]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[25]  Alborz Geramifard,et al.  Adaptive Planning for Markov Decision Processes with Uncertain Transition Models via Incremental Feature Dependency Discovery , 2012, ECML/PKDD.

[26]  Andrew G. Barto,et al.  Convergence of Indirect Adaptive Asynchronous Value Iteration Algorithms , 1993, NIPS.

[27]  Bart De Schutter,et al.  A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[28]  Alborz Geramifard,et al.  Multi-Agent Planning for Persistent Missions with Automated Battery Management , 2011 .

[29]  Jonathan P. How,et al.  Health Aware Planning under uncertainty for UAV missions with heterogeneous teams , 2013, 2013 European Control Conference (ECC).

[30]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[31]  Han-Lim Choi,et al.  Consensus-Based Decentralized Auctions for Robust Task Allocation , 2009, IEEE Transactions on Robotics.

[32]  Joshua D. Redding,et al.  Approximate multi-agent planning in dynamic and uncertain environments , 2011 .

[33]  Jonathan P. How,et al.  Experimental Demonstration of Adaptive MDP-Based Planning with Model Uncertainty , 2008 .

[34]  Jonathan P. How,et al.  Experimental Demonstration of Multi-Agent Learning and Planning under Uncertainty for Persistent Missions with Automated Battery Management , 2012 .

[35]  Shalabh Bhatnagar,et al.  Multi-Step Dyna Planning for Policy Evaluation and Control , 2009, NIPS.

[36]  Warren B. Powell,et al.  Approximate Dynamic Programming: Solving the Curses of Dimensionality (Wiley Series in Probability and Statistics) , 2007 .

[37]  Ronald C. Arkin,et al.  Multiagent Mission Specification and Execution , 1997, Auton. Robots.