论文信息 - Leveraging Forward Model Prediction Error for Learning Control

Leveraging Forward Model Prediction Error for Learning Control

Learning for model based control can be sample-efficient and generalize well, however successfully learning models and controllers that represent the problem at hand can be challenging for complex tasks. Using inaccurate models for learning can lead to sub-optimal solutions, that are unlikely to perform well in practice. In this work, we present a learning approach which iterates between model learning and data collection and leverages forward model prediction error for learning control. We show how using the controller's prediction as input to a forward model can create a differentiable connection between the controller and the model, allowing us to formulate a loss in the state space. This lets us include forward model prediction error during controller learning and we show that this creates a loss objective that significantly improves learning on different motor control tasks. We provide empirical and theoretical results that show the benefits of our method and present evaluations in simulation for learning control on a 7 DoF manipulator and an underactuated 12 DoF quadruped. We show that our approach successfully learns controllers for challenging motor control tasks involving contact switching.

[1] Ludovic Righetti,et al. An Open Torque-Controlled Modular Robot Architecture for Legged Locomotion Research , 2019, IEEE Robotics and Automation Letters.

[2] Ludovic Righetti,et al. Curious iLQR: Resolving Uncertainty in Model-based RL , 2019, CoRL.

[3] D M Wolpert,et al. Multiple paired forward and inverse models for motor control , 1998, Neural Networks.

[4] David J. Reinkensmeyer,et al. Using associative content-addressable memories to control robots , 1988, Proceedings of the 27th IEEE Conference on Decision and Control.

[5] A.G. Alleyne,et al. A survey of iterative learning control , 2006, IEEE Control Systems.

[6] Bruno Lara,et al. Coupled inverse-forward models for action execution leading to tool-use in a humanoid robot , 2012, 2012 7th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[7] Carl E. Rasmussen,et al. PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[8] Stefan Schaal,et al. Learning Operational Space Control , 2006, Robotics: Science and Systems.

[9] M Ito,et al. Neurophysiological aspects of the cerebellar motor control system. , 1970, International journal of neurology.

[10] J. Izawa,et al. The cerebro-cerebellum: Could it be loci of forward models? , 2016, Neuroscience Research.

[11] Wojciech Jaskowski,et al. Model-Based Active Exploration , 2018, ICML.

[12] Jan Peters,et al. Learning Coupled Forward-Inverse Models with Combined Prediction Errors , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[13] Oussama Khatib,et al. A unified approach for motion and force control of robot manipulators: The operational space formulation , 1987, IEEE J. Robotics Autom..

[14] Nuttapong Chentanez,et al. Intrinsically Motivated Learning of Hierarchical Collections of Skills , 2004 .

[15] Michael I. Jordan,et al. Forward Models: Supervised Learning with a Distal Teacher , 1992, Cogn. Sci..

[16] Silvio Savarese,et al. Making Sense of Vision and Touch: Self-Supervised Learning of Multimodal Representations for Contact-Rich Tasks , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[17] Nuttapong Chentanez,et al. Intrinsically Motivated Reinforcement Learning , 2004, NIPS.

[18] Mohit Sharma,et al. Leveraging Multimodal Haptic Sensory Data for Robust Cutting , 2019, 2019 IEEE-RAS 19th International Conference on Humanoid Robots (Humanoids).

[19] D. Wolpert,et al. Internal models in the cerebellum , 1998, Trends in Cognitive Sciences.

[20] Daniel M. Wolpert,et al. Forward Models for Physiological Motor Control , 1996, Neural Networks.

[21] Alexander Herzog,et al. Momentum control with hierarchical inverse dynamics on a torque-controlled humanoid , 2014, Autonomous Robots.

[22] Giorgio Metta,et al. Incremental semiparametric inverse dynamics learning , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[23] Sergey Levine,et al. Guided Policy Search , 2013, ICML.

[24] Richard L. Lewis,et al. Intrinsically Motivated Reinforcement Learning: An Evolutionary Perspective , 2010, IEEE Transactions on Autonomous Mental Development.

[25] Pierre-Yves Oudeyer,et al. Exploration in Model-based Reinforcement Learning by Empirically Estimating Learning Progress , 2012, NIPS.

[26] Duy Nguyen-Tuong,et al. Computed torque control with nonparametric regression models , 2008, 2008 American Control Conference.

[27] Sergey Levine,et al. Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models , 2018, NeurIPS.

[28] Jan Peters,et al. Intrinsic Motivation and Mental Replay enable Efficient Online Adaptation in Stochastic Recurrent Networks , 2018, Neural Networks.

[29] Alexander Herzog,et al. On Time Optimization of Centroidal Momentum Dynamics , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[30] Jan Peters,et al. Deep Lagrangian Networks: Using Physics as Model Prior for Deep Learning , 2019, ICLR.

[31] Pierre-Yves Oudeyer,et al. Curiosity Driven Exploration of Learned Disentangled Goal Spaces , 2018, CoRL.

[32] Erik Talvitie,et al. Self-Correcting Models for Model-Based Reinforcement Learning , 2016, AAAI.

[33] Michael I. Jordan,et al. An internal model for sensorimotor integration. , 1995, Science.

[34] W. Thomas Miller,et al. Sensor-based control of robotic manipulators using a general learning algorithm , 1987, IEEE J. Robotics Autom..

[35] David J. Reinkensmeyer,et al. Using associative content-addressable memories to control robots , 1989, Proceedings, 1989 International Conference on Robotics and Automation.

[36] Erik Talvitie,et al. Model Regularization for Stable Sample Rollouts , 2014, UAI.

[37] Tom Schaul,et al. Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.

[38] Eiichi Yoshida,et al. Real-time smooth task transitions for hierarchical inverse kinematics , 2013, 2013 13th IEEE-RAS International Conference on Humanoid Robots (Humanoids).

[39] Alexei A. Efros,et al. Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).