Deep Reinforcement Learning with Linear Quadratic Regulator Regions

Practitioners often rely on compute-intensive domain randomization to ensure reinforcement learning policies trained in simulation can robustly transfer to the real world. Due to unmodeled nonlinearities in the real system, however, even such simulated policies can still fail to perform stably enough to acquire experience in real environments. In this paper we propose a novel method that guarantees a stable region of attraction for the output of a policy trained in simulation, even for highly nonlinear systems. Our core technique is to use "bias-shifted" neural networks for constructing the controller and training the network in the simulator. The modified neural networks not only capture the nonlinearities of the system but also provably preserve linearity in a certain region of the state space and thus can be tuned to resemble a linear quadratic regulator that is known to be stable for the real system. We have tested our new method by transferring simulated policies for a swing-up inverted pendulum to real systems and demonstrated its efficacy.

[1]  Yuan Tian,et al.  H∞ Model-free Reinforcement Learning with Robust Stability Guarantee , 2019, ArXiv.

[2]  Joonho Lee,et al.  Learning agile and dynamic motor skills for legged robots , 2019, Science Robotics.

[3]  Razvan Pascanu,et al.  On the number of response regions of deep feed forward networks with piece-wise linear activations , 2013, 1312.6098.

[4]  Larry S. Davis,et al.  Exploiting local features from deep networks for image retrieval , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[5]  Sergey Levine,et al.  Sim-To-Real via Sim-To-Sim: Data-Efficient Robotic Grasping via Randomized-To-Canonical Adaptation Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Marcin Andrychowicz,et al.  Solving Rubik's Cube with a Robot Hand , 2019, ArXiv.

[7]  João Pedro Hespanha,et al.  Linear Systems Theory , 2009 .

[8]  Matthew Johnson-Roberson,et al.  Sensor Transfer: Learning Optimal Sensor Effect Image Augmentation for Sim-to-Real Domain Adaptation , 2018, IEEE Robotics and Automation Letters.

[9]  Wojciech Zaremba,et al.  Domain randomization for transferring deep neural networks from simulation to the real world , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[10]  Stefan Schaal,et al.  On the design of LQR kernels for efficient controller learning , 2017, 2017 IEEE 56th Annual Conference on Decision and Control (CDC).

[11]  Yong-Sheng Chen,et al.  Batch-normalized Maxout Network in Network , 2015, ArXiv.

[12]  Sebastian Trimpe,et al.  Robust Model-free Reinforcement Learning with Multi-objective Bayesian Optimization , 2019, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[13]  Jan Peters,et al.  Assessing Transferability From Simulation to Reality for Reinforcement Learning , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  David Stutz,et al.  Neural Codes for Image Retrieval , 2015 .

[15]  Yevgen Chebotar,et al.  Closing the Sim-to-Real Loop: Adapting Simulation Randomization with Real World Experience , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[16]  Svetlana Lazebnik,et al.  Multi-scale Orderless Pooling of Deep Convolutional Activation Features , 2014, ECCV.

[17]  Herke van Hoof,et al.  Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.

[18]  Pierre-Yves Oudeyer,et al.  Sim-to-Real Transfer with Neural-Augmented Robot Simulation , 2018, CoRL.

[19]  Kazuhito Yokoi,et al.  The 3D linear inverted pendulum mode: a simple modeling for a biped walking pattern generation , 2001, Proceedings 2001 IEEE/RSJ International Conference on Intelligent Robots and Systems. Expanding the Societal Role of Robotics in the the Next Millennium (Cat. No.01CH37180).

[20]  H. Benjamin Brown,et al.  Experiments in Balance with a 3D One-Legged Hopping Machine , 1984 .

[21]  Ruosong Wang,et al.  Continuous Control with Contexts, Provably , 2019, ArXiv.

[22]  Atsuto Maki,et al.  Visual Instance Retrieval with Deep Convolutional Networks , 2014, ICLR.

[23]  Atil Iscen,et al.  Sim-to-Real: Learning Agile Locomotion For Quadruped Robots , 2018, Robotics: Science and Systems.

[24]  Marcin Andrychowicz,et al.  Sim-to-Real Transfer of Robotic Control with Dynamics Randomization , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[25]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[26]  Steven J. Bradtke,et al.  Reinforcement Learning Applied to Linear Quadratic Regulation , 1992, NIPS.

[27]  Razvan Pascanu,et al.  On the Number of Linear Regions of Deep Neural Networks , 2014, NIPS.

[28]  Tara N. Sainath,et al.  Improving deep neural networks for LVCSR using rectified linear units and dropout , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[29]  Vladlen Koltun,et al.  Deep Drone Racing: From Simulation to Reality With Domain Randomization , 2019, IEEE Transactions on Robotics.

[30]  Yuxi Li,et al.  Deep Reinforcement Learning: An Overview , 2017, ArXiv.

[31]  Yuan Tian,et al.  $H_\infty$ Model-free Reinforcement Learning with Robust Stability Guarantee , 2019 .

[32]  Pieter Abbeel,et al.  Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.