论文信息 - Constraint Learning for Control Tasks with Limited Duration Barrier Functions

Constraint Learning for Control Tasks with Limited Duration Barrier Functions

When deploying autonomous agents in unstructured environments over sustained periods of time, adaptability and robustness oftentimes outweigh optimality as a primary consideration. In other words, safety and survivability constraints play a key role and in this paper, we present a novel, constraint-learning framework for control tasks built on the idea of constraints-driven control. However, since control policies that keep a dynamical agent within state constraints over infinite horizons are not always available, this work instead considers constraints that can be satisfied over a sufficiently long time horizon T > 0, which we refer to as limited-duration safety. Consequently, value function learning can be used as a tool to help us find limited-duration safe policies. We show that, in some applications, the existence of limited-duration safe policies is actually sufficient for long-duration autonomy. This idea is illustrated on a swarm of simulated robots that are tasked with covering a given area, but that sporadically need to abandon this task to charge batteries. We show how the battery-charging behavior naturally emerges as a result of the constraints. Additionally, using a cart-pole simulation environment, we show how a control policy can be efficiently transferred from the source task, balancing the pole, to the target task, moving the cart to one direction without letting the pole fall down.

[1] Ofir Nachum,et al. A Lyapunov-based Approach to Safe Reinforcement Learning , 2018, NeurIPS.

[2] Andreas Krause,et al. Safe Model-based Reinforcement Learning with Stability Guarantees , 2017, NIPS.

[3] Andreas Krause,et al. The Lyapunov Neural Network: Adaptive Stability Certification for Safe Learning of Dynamical Systems , 2018, CoRL.

[4] Magnus Egerstedt,et al. Persistification of Robotic Tasks Using Control Barrier Functions , 2018, IEEE Robotics and Automation Letters.

[5] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .

[6] Yuval Tassa,et al. DeepMind Control Suite , 2018, ArXiv.

[7] Franco Blanchini,et al. Set-theoretic methods in control , 2007 .

[8] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[9] Koushil Sreenath,et al. Discrete Control Barrier Functions for Safety-Critical Control of Discrete Systems with Application to Bipedal Robot Navigation , 2017, Robotics: Science and Systems.

[10] Sebastian Thrun,et al. Learning to Learn , 1998, Springer US.

[11] Eduardo Sontag. A universal construction of Artstein's theorem on nonlinear stabilization , 1989 .

[12] Pieter Abbeel,et al. Meta Learning Shared Hierarchies , 2017, ICLR.

[13] Tom Schaul,et al. Prioritized Experience Replay , 2015, ICLR.

[14] Masashi Sugiyama,et al. Statistical Reinforcement Learning - Modern Machine Learning Approaches , 2015, Chapman and Hall / CRC machine learning and pattern recognition series.

[15] Aude Billard,et al. Learning control Lyapunov function to ensure stability of dynamical system-based robot reaching motions , 2014, Robotics Auton. Syst..

[16] J. Cortés. Discontinuous dynamical systems , 2008, IEEE Control Systems.

[17] Magnus Egerstedt,et al. Safe certificate-based maneuvers for teams of quadrotors using differential flatness , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[18] Magnus Egerstedt,et al. Boolean Composability of Constraints and Control Synthesis for Multi-Robot Systems via Nonsmooth Control Barrier Functions , 2018, 2018 IEEE Conference on Control Technology and Applications (CCTA).

[19] Stefan Ratschan,et al. Converse Theorems for Safety and Barrier Certificates , 2017, IEEE Transactions on Automatic Control.

[20] Li Wang,et al. Permissive Barrier Certificates for Safe Stabilization Using Sum-of-squares , 2018, 2018 Annual American Control Conference (ACC).

[21] Frank Allgöwer,et al. CONSTRUCTIVE SAFETY USING CONTROL BARRIER FUNCTIONS , 2007 .

[22] P. Olver. Nonlinear Systems , 2013 .

[23] Rafael Wisniewski,et al. Converse Barrier Certificate Theorems , 2016, IEEE Transactions on Automatic Control.

[24] Sridhar Mahadevan,et al. Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[25] F.L. Lewis,et al. Reinforcement learning and adaptive dynamic programming for feedback control , 2009, IEEE Circuits and Systems Magazine.

[26] Sonia Martínez,et al. Coverage control for mobile sensing networks , 2002, IEEE Transactions on Robotics and Automation.

[27] Magnus Egerstedt,et al. Robot ecology: Constraint-based control design for long duration autonomy , 2018, Annu. Rev. Control..

[28] Peter Stone,et al. Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[29] Andrew G. Barto,et al. Lyapunov Design for Safe Reinforcement Learning , 2003, J. Mach. Learn. Res..

[30] Jing Yuan,et al. Log-barrier constrained CNNs , 2019, ArXiv.

[31] Koushil Sreenath,et al. Exponential Control Barrier Functions for enforcing high relative-degree safety-critical constraints , 2016, 2016 American Control Conference (ACC).

[32] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[33] Daniel Liberzon,et al. Calculus of Variations and Optimal Control Theory: A Concise Introduction , 2012 .

[34] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[35] Magnus Egerstedt,et al. Nonsmooth Barrier Functions With Applications to Multi-Robot Systems , 2017, IEEE Control Systems Letters.

[36] Aaron D. Ames,et al. Safety Barrier Certificates for Collisions-Free Multirobot Systems , 2017, IEEE Transactions on Robotics.

[37] Li Wang,et al. Multi-objective compositions for collision-free connectivity maintenance in teams of mobile robots , 2016, 2016 IEEE 55th Conference on Decision and Control (CDC).

[38] Jorge Cortes,et al. Coordinated Control of Multi-Robot Systems: A Survey , 2017 .

[39] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[40] Regina Barzilay,et al. Language Understanding for Text-based Games using Deep Reinforcement Learning , 2015, EMNLP.

[41] Daniel E. Koditschek,et al. Exact robot navigation using artificial potential functions , 1992, IEEE Trans. Robotics Autom..

[42] H. Kushner. Stochastic Stability and Control , 2012 .

[43] Li Wang,et al. The Robotarium: A remotely accessible swarm robotics research testbed , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[44] Aaron D. Ames,et al. Sufficient conditions for the Lipschitz continuity of QP-based multi-objective control of humanoid robots , 2013, 52nd IEEE Conference on Decision and Control.

[45] Sergey Levine,et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[46] P. Kokotovic,et al. Inverse Optimality in Robust Stabilization , 1996 .

[47] Leslie Pack Kaelbling,et al. A constraint-based method for solving sequential manipulation planning problems , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[48] Qiang Yang,et al. A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[49] Magnus Egerstedt,et al. Constraint-Driven Coordinated Control of Multi-Robot Systems , 2018, 2019 American Control Conference (ACC).

[50] Li Wang,et al. Barrier-Certified Adaptive Reinforcement Learning With Applications to Brushbot Navigation , 2018, IEEE Transactions on Robotics.

[51] Pieter Abbeel,et al. Constrained Policy Optimization , 2017, ICML.

[52] Thomas G. Dietterich. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[53] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[54] Paulo Tabuada,et al. Control Barrier Function Based Quadratic Programs for Safety Critical Systems , 2016, IEEE Transactions on Automatic Control.

[55] Paulo Tabuada,et al. Robustness of Control Barrier Functions for Safety Critical Control , 2016, ADHS.

[56] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.

[57] Sridhar Mahadevan,et al. Projected Natural Actor-Critic , 2013, NIPS.

[58] Sebastian Thrun,et al. Lifelong robot learning , 1993, Robotics Auton. Syst..

[59] W. Fleming,et al. Controlled Markov processes and viscosity solutions , 1992 .