Guaranteed Safe Online Learning via Reachability: tracking a ground target using a quadrotor

While machine learning techniques have become popular tools in the design of autonomous systems, the asymptotic nature of their performance guarantees means that they should not be used in scenarios in which safety and robustness are critical for success. By pairing machine learning algorithms with rigorous safety analyses, such as Hamilton-Jacobi-Isaacs (HJI) reachability, this limitation can be overcome. Guaranteed Safe Online Learning via Reachability (GSOLR) is a framework which combines HJI reachability with general machine learning techniques, allowing for the design of robotic systems which demonstrate both high performance and guaranteed safety. In this paper we show how the GSOLR framework can be applied to a target tracking problem, in which an observing quadrotor helicopter must keep a target ground vehicle with unknown (but bounded) dynamics inside its field of view at all times, while simultaneously attempting to build a motion model of the target. The resulting algorithm was implemented on board the Stanford Testbed of Autonomous Rotorcraft for Multi-Agent Control, and was compared to a naive safety-only algorithm and a learning-only algorithm. Experimental results illustrate the success of the GSOLR algorithm, even under scenarios in which the machine learning algorithm performed poorly (and would otherwise lead to unsafe actions), thus demonstrating the power of this technique.

[1]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[2]  C. Geyer On the Asymptotics of Constrained $M$-Estimation , 1994 .

[3]  Wenjiang J. Fu,et al.  Asymptotics for lasso-type estimators , 2000 .

[4]  Michael Kearns,et al.  Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.

[5]  Alexandre M. Bayen,et al.  A time-dependent Hamilton-Jacobi formulation of reachable sets for continuous dynamic games , 2005, IEEE Transactions on Automatic Control.

[6]  Ian M. Mitchell,et al.  A Toolbox of Level Set Methods , 2005 .

[7]  Pieter Abbeel,et al.  An Application of Reinforcement Learning to Aerobatic Helicopter Flight , 2006, NIPS.

[8]  Sebastian Thrun,et al.  Self-supervised Monocular Road Detection in Desert Terrain , 2006, Robotics: Science and Systems.

[9]  Claire J. Tomlin,et al.  Decentralized cooperative collision avoidance for acceleration constrained vehicles , 2008, 2008 47th IEEE Conference on Decision and Control.

[10]  Alexandre M. Bayen,et al.  Guaranteed bounds on highway travel times using probe and fixed data , 2009 .

[11]  Russ Tedrake,et al.  System Identification of Post Stall Aerodynamics for UAV Perching , 2009 .

[12]  Morgan Quigley,et al.  ROS: an open-source Robot Operating System , 2009, ICRA 2009.

[13]  Stefan Schaal,et al.  Fast, robust quadruped locomotion over challenging terrain , 2010, 2010 IEEE International Conference on Robotics and Automation.

[14]  Claire J. Tomlin,et al.  Mobile Sensor Network Control Using Mutual Information Methods and Particle Filters , 2010, IEEE Transactions on Automatic Control.

[15]  Ian R. Manchester,et al.  Feedback controller parameterizations for Reinforcement Learning , 2011, 2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).

[16]  Claire J. Tomlin,et al.  Guaranteed safe online learning of a bounded system , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[17]  Sebastian Thrun,et al.  Tracking-based semi-supervised learning , 2011, Int. J. Robotics Res..

[18]  S. Shankar Sastry,et al.  Provably safe and robust learning-based model predictive control , 2011, Autom..