Constraint Learning for Control Tasks with Limited Duration Barrier Functions

When deploying autonomous agents in unstructured environments over sustained periods of time, adaptability and robustness oftentimes outweigh optimality as a primary consideration. In other words, safety and survivability constraints play a key role and in this paper, we present a novel, constraint-learning framework for control tasks built on the idea of constraints-driven control. However, since control policies that keep a dynamical agent within state constraints over infinite horizons are not always available, this work instead considers constraints that can be satisfied over a sufficiently long time horizon T > 0, which we refer to as limited-duration safety. Consequently, value function learning can be used as a tool to help us find limited-duration safe policies. We show that, in some applications, the existence of limited-duration safe policies is actually sufficient for long-duration autonomy. This idea is illustrated on a swarm of simulated robots that are tasked with covering a given area, but that sporadically need to abandon this task to charge batteries. We show how the battery-charging behavior naturally emerges as a result of the constraints. Additionally, using a cart-pole simulation environment, we show how a control policy can be efficiently transferred from the source task, balancing the pole, to the target task, moving the cart to one direction without letting the pole fall down.

[1]  Ofir Nachum,et al.  A Lyapunov-based Approach to Safe Reinforcement Learning , 2018, NeurIPS.

[2]  Andreas Krause,et al.  Safe Model-based Reinforcement Learning with Stability Guarantees , 2017, NIPS.

[3]  Andreas Krause,et al.  The Lyapunov Neural Network: Adaptive Stability Certification for Safe Learning of Dynamical Systems , 2018, CoRL.

[4]  Magnus Egerstedt,et al.  Persistification of Robotic Tasks Using Control Barrier Functions , 2018, IEEE Robotics and Automation Letters.

[5]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[6]  Yuval Tassa,et al.  DeepMind Control Suite , 2018, ArXiv.

[7]  Franco Blanchini,et al.  Set-theoretic methods in control , 2007 .

[8]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[9]  Koushil Sreenath,et al.  Discrete Control Barrier Functions for Safety-Critical Control of Discrete Systems with Application to Bipedal Robot Navigation , 2017, Robotics: Science and Systems.

[10]  Sebastian Thrun,et al.  Learning to Learn , 1998, Springer US.

[11]  Eduardo Sontag A universal construction of Artstein's theorem on nonlinear stabilization , 1989 .

[12]  Pieter Abbeel,et al.  Meta Learning Shared Hierarchies , 2017, ICLR.

[13]  Tom Schaul,et al.  Prioritized Experience Replay , 2015, ICLR.

[14]  Masashi Sugiyama,et al.  Statistical Reinforcement Learning - Modern Machine Learning Approaches , 2015, Chapman and Hall / CRC machine learning and pattern recognition series.

[15]  Aude Billard,et al.  Learning control Lyapunov function to ensure stability of dynamical system-based robot reaching motions , 2014, Robotics Auton. Syst..

[16]  J. Cortés Discontinuous dynamical systems , 2008, IEEE Control Systems.

[17]  Magnus Egerstedt,et al.  Safe certificate-based maneuvers for teams of quadrotors using differential flatness , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[18]  Magnus Egerstedt,et al.  Boolean Composability of Constraints and Control Synthesis for Multi-Robot Systems via Nonsmooth Control Barrier Functions , 2018, 2018 IEEE Conference on Control Technology and Applications (CCTA).

[19]  Stefan Ratschan,et al.  Converse Theorems for Safety and Barrier Certificates , 2017, IEEE Transactions on Automatic Control.

[20]  Li Wang,et al.  Permissive Barrier Certificates for Safe Stabilization Using Sum-of-squares , 2018, 2018 Annual American Control Conference (ACC).

[21]  Frank Allgöwer,et al.  CONSTRUCTIVE SAFETY USING CONTROL BARRIER FUNCTIONS , 2007 .

[22]  P. Olver Nonlinear Systems , 2013 .

[23]  Rafael Wisniewski,et al.  Converse Barrier Certificate Theorems , 2016, IEEE Transactions on Automatic Control.

[24]  Sridhar Mahadevan,et al.  Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[25]  F.L. Lewis,et al.  Reinforcement learning and adaptive dynamic programming for feedback control , 2009, IEEE Circuits and Systems Magazine.

[26]  Sonia Martínez,et al.  Coverage control for mobile sensing networks , 2002, IEEE Transactions on Robotics and Automation.

[27]  Magnus Egerstedt,et al.  Robot ecology: Constraint-based control design for long duration autonomy , 2018, Annu. Rev. Control..

[28]  Peter Stone,et al.  Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[29]  Andrew G. Barto,et al.  Lyapunov Design for Safe Reinforcement Learning , 2003, J. Mach. Learn. Res..

[30]  Jing Yuan,et al.  Log-barrier constrained CNNs , 2019, ArXiv.

[31]  Koushil Sreenath,et al.  Exponential Control Barrier Functions for enforcing high relative-degree safety-critical constraints , 2016, 2016 American Control Conference (ACC).

[32]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[33]  Daniel Liberzon,et al.  Calculus of Variations and Optimal Control Theory: A Concise Introduction , 2012 .

[34]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[35]  Magnus Egerstedt,et al.  Nonsmooth Barrier Functions With Applications to Multi-Robot Systems , 2017, IEEE Control Systems Letters.

[36]  Aaron D. Ames,et al.  Safety Barrier Certificates for Collisions-Free Multirobot Systems , 2017, IEEE Transactions on Robotics.

[37]  Li Wang,et al.  Multi-objective compositions for collision-free connectivity maintenance in teams of mobile robots , 2016, 2016 IEEE 55th Conference on Decision and Control (CDC).

[38]  Jorge Cortes,et al.  Coordinated Control of Multi-Robot Systems: A Survey , 2017 .

[39]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[40]  Regina Barzilay,et al.  Language Understanding for Text-based Games using Deep Reinforcement Learning , 2015, EMNLP.

[41]  Daniel E. Koditschek,et al.  Exact robot navigation using artificial potential functions , 1992, IEEE Trans. Robotics Autom..

[42]  H. Kushner Stochastic Stability and Control , 2012 .

[43]  Li Wang,et al.  The Robotarium: A remotely accessible swarm robotics research testbed , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[44]  Aaron D. Ames,et al.  Sufficient conditions for the Lipschitz continuity of QP-based multi-objective control of humanoid robots , 2013, 52nd IEEE Conference on Decision and Control.

[45]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[46]  P. Kokotovic,et al.  Inverse Optimality in Robust Stabilization , 1996 .

[47]  Leslie Pack Kaelbling,et al.  A constraint-based method for solving sequential manipulation planning problems , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[48]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[49]  Magnus Egerstedt,et al.  Constraint-Driven Coordinated Control of Multi-Robot Systems , 2018, 2019 American Control Conference (ACC).

[50]  Li Wang,et al.  Barrier-Certified Adaptive Reinforcement Learning With Applications to Brushbot Navigation , 2018, IEEE Transactions on Robotics.

[51]  Pieter Abbeel,et al.  Constrained Policy Optimization , 2017, ICML.

[52]  Thomas G. Dietterich Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[53]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[54]  Paulo Tabuada,et al.  Control Barrier Function Based Quadratic Programs for Safety Critical Systems , 2016, IEEE Transactions on Automatic Control.

[55]  Paulo Tabuada,et al.  Robustness of Control Barrier Functions for Safety Critical Control , 2016, ADHS.

[56]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[57]  Sridhar Mahadevan,et al.  Projected Natural Actor-Critic , 2013, NIPS.

[58]  Sebastian Thrun,et al.  Lifelong robot learning , 1993, Robotics Auton. Syst..

[59]  W. Fleming,et al.  Controlled Markov processes and viscosity solutions , 1992 .