Error-Aware Policy Learning: Zero-Shot Generalization in Partially Observable Dynamic Environments

Simulation provides a safe and efficient way to generate useful data for learning complex robotic tasks. However, matching simulation and real-world dynamics can be quite challenging, especially for systems that have a large number of unobserved or unmeasurable parameters, which may lie in the robot dynamics itself or in the environment with which the robot interacts. We introduce a novel approach to tackle such a sim-to-real problem by developing policies capable of adapting to new environments, in a zero-shot manner. Key to our approach is an error-aware policy (EAP) that is explicitly made aware of the effect of unobservable factors during training. An EAP takes as input the predicted future state error in the target environment, which is provided by an error-prediction function, simultaneously trained with the EAP. We validate our approach on an assistive walking device trained to help the human user recover from external pushes. We show that a trained EAP for a hip-torque assistive device can be transferred to different human agents with unseen biomechanical characteristics. In addition, we show that our method can be applied to other standard RL control tasks.

[1]  David D. Fan,et al.  Bayesian Learning-Based Adaptive Control for Safety Critical Systems , 2019, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[2]  Jakub W. Pachocki,et al.  Learning dexterous in-hand manipulation , 2018, Int. J. Robotics Res..

[3]  Abhinav Gupta,et al.  Robust Adversarial Reinforcement Learning , 2017, ICML.

[4]  C. Karen Liu,et al.  Sim-to-Real Transfer for Biped Locomotion , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[5]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[6]  Sergey Levine,et al.  Learning to Adapt: Meta-Learning for Model-Based Control , 2018, ArXiv.

[7]  Greg Turk,et al.  Preparing for the Unknown: Learning a Universal Policy with Online System Identification , 2017, Robotics: Science and Systems.

[8]  Wojciech Zaremba,et al.  Domain randomization for transferring deep neural networks from simulation to the real world , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[9]  Abhinav Gupta,et al.  Environment Probing Interaction Policies , 2019, ICLR.

[10]  Rachel W Jackson,et al.  Heuristic-Based Ankle Exoskeleton Control for Co-Adaptive Assistance of Human Locomotion , 2019, IEEE Transactions on Neural Systems and Rehabilitation Engineering.

[11]  Sergey Levine,et al.  SimGAN: Hybrid Simulator Identification for Domain Adaptation via Adversarial Reinforcement Learning , 2021, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[12]  Zhicong Huang,et al.  Adaptive impedance control of robotic exoskeletons using reinforcement learning , 2016, 2016 International Conference on Advanced Robotics and Mechatronics (ICARM).

[13]  M. Srinivasan,et al.  Stepping in the direction of the fall: the next foot placement can be predicted from current upper body state in steady-state walking , 2014, Biology Letters.

[14]  Sehoon Ha,et al.  Learning a Control Policy for Fall Prevention on an Assistive Walking Device , 2019, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[15]  Joonho Lee,et al.  Learning agile and dynamic motor skills for legged robots , 2019, Science Robotics.

[16]  Dieter Fox,et al.  BayesSim: adaptive domain randomization via probabilistic inference for robotics simulators , 2019, Robotics: Science and Systems.

[17]  Atil Iscen,et al.  Sim-to-Real: Learning Agile Locomotion For Quadruped Robots , 2018, Robotics: Science and Systems.

[18]  Atil Iscen,et al.  Data Efficient Reinforcement Learning for Legged Robots , 2019, CoRL.

[19]  Balaraman Ravindran,et al.  EPOpt: Learning Robust Neural Network Policies Using Model Ensembles , 2016, ICLR.

[20]  Marcos Duarte,et al.  A public dataset of overground and treadmill walking kinematics and kinetics in healthy individuals , 2018, PeerJ.

[21]  Rui Luo,et al.  Data-Driven Reinforcement Learning for Walking Assistance Control of a Lower Limb Exoskeleton with Hemiplegic Patients , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[22]  Sicun Gao,et al.  Provably Efficient Model-based Policy Adaptation , 2020, ICML.

[23]  Jan Peters,et al.  Data-Efficient Domain Randomization With Bayesian Optimization , 2020, IEEE Robotics and Automation Letters.

[24]  Jie Tan,et al.  Learning Agile Robotic Locomotion Skills by Imitating Animals , 2020, RSS 2020.

[25]  Rachel W Jackson,et al.  Human-in-the-loop optimization of exoskeleton assistance during walking , 2017, Science.

[26]  Masayoshi Tomizuka,et al.  Zero-shot Deep Reinforcement Learning Driving Policy Transfer for Autonomous Vehicles based on Robust Control , 2018, 2018 21st International Conference on Intelligent Transportation Systems (ITSC).

[27]  Silvestro Micera,et al.  Direction-Dependent Adaptation of Dynamic Gait Stability Following Waist-Pull Perturbations , 2016, IEEE Transactions on Neural Systems and Rehabilitation Engineering.

[28]  A. Hof,et al.  Balance responses to lateral perturbations in human treadmill walking , 2010, Journal of Experimental Biology.

[29]  Peter Stone,et al.  Stochastic Grounded Action Transformation for Robot Learning in Simulation , 2017, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[30]  Yevgen Chebotar,et al.  Closing the Sim-to-Real Loop: Adapting Simulation Randomization with Real World Experience , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[31]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[32]  Timothy M. Hospedales,et al.  Adversarial Generation of Informative Trajectories for Dynamics System Identification , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[33]  C. Karen Liu,et al.  Policy Transfer via Kinematic Domain Randomization and Adaptation , 2020, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[34]  Michiel van de Panne,et al.  Learning Locomotion Skills for Cassie: Iterative Design and Sim-to-Real , 2019, CoRL.

[35]  Marcin Andrychowicz,et al.  Sim-to-Real Transfer of Robotic Control with Dynamics Randomization , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[36]  Zhijun Li,et al.  DMP-Based Motion Generation for a Walking Exoskeleton Robot Using Reinforcement Learning , 2020, IEEE Transactions on Industrial Electronics.

[37]  Sergey Levine,et al.  DeepMimic , 2018, ACM Trans. Graph..

[38]  Christopher Joseph Pal,et al.  Active Domain Randomization , 2019, CoRL.

[39]  Rowan McAllister,et al.  Model-Based Meta-Reinforcement Learning for Flight With Suspended Payloads , 2020, IEEE Robotics and Automation Letters.

[40]  Sehoon Ha,et al.  Learning Fast Adaptation With Meta Strategy Optimization , 2020, IEEE Robotics and Automation Letters.