UAV cooperative control with stochastic risk models

Risk and reward are fundamental concepts in the cooperative control of unmanned systems. This paper focuses on a constructive relationship between a cooperative planner and a learner in order to mitigate the learning risk while boosting the asymptotic performance and safety of agent behavior. Our framework is an instance of the intelligent cooperative control architecture (iCCA) where a learner (Natural actor-critic, Sarsa) initially follows a "safe" policy generated by a cooperative planner (consensus-based bundle algorithm). The learner incrementally improves this baseline policy through interaction, while avoiding behaviors believed to be "risky". This paper extends previous work toward the coupling of learning and cooperative control strategies in real-time stochastic domains in two ways: (1) the risk analysis module supports stochastic risk models, and (2) learning schemes that do not store the policy as a separate entity are integrated with the cooperative planner extending the applicability of iCCA framework. The performance of the resulting approaches are demonstrated through simulation of limited fuel UAVs in a stochastic task assignment problem. Results show an 8% reduction in risk, while improving the performance up to 30%.

[1]  Ronald A. Howard,et al.  Dynamic Programming and Markov Processes , 1960 .

[2]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[3]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[4]  Leslie Pack Kaelbling,et al.  On the Complexity of Solving Markov Decision Problems , 1995, UAI.

[5]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[6]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[7]  Ronen I. Brafman,et al.  R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..

[8]  J. Hespanha,et al.  Discrete approximations to continuous shortest-path: application to minimum-risk path planning for groups of UAVs , 2003, 42nd IEEE International Conference on Decision and Control (IEEE Cat. No.03CH37475).

[9]  R. John Hansman,et al.  An Integrated Approach to Evaluating Risk Mitigation Measures for UAV Operational Concepts in the NAS , 2005 .

[10]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[11]  Shalabh Bhatnagar,et al.  Incremental Natural Actor-Critic Algorithms , 2007, NIPS.

[12]  Stefan Schaal,et al.  Natural Actor-Critic , 2003, Neurocomputing.

[13]  Han-Lim Choi,et al.  Consensus-Based Decentralized Auctions for Robust Task Allocation , 2009, IEEE Transactions on Robotics.

[14]  Han-Lim Choi,et al.  An intelligent Cooperative Control Architecture , 2010, Proceedings of the 2010 American Control Conference.

[15]  U. Rieder,et al.  Markov Decision Processes , 2010 .

[16]  Han-Lim Choi,et al.  Decentralized planning for complex missions with dynamic communication constraints , 2010, Proceedings of the 2010 American Control Conference.

[17]  Alborz Geramifard,et al.  Actor-Critic Policy Learning in Cooperative Planning , 2010, AAAI Spring Symposium: Embedded Reasoning.