Model-based Reinforcement Learning with Neural Networks on Hierarchical Dynamic System

This paper describes our strategy to approach reinforcement learning in robotic domains including the use of neural networks. We summarize our recent work on model-based reinforcement learning where models of hierarchical dynamic system are learned with stochastic neural networks [Yamaguchi and Atkeson, 2016b], and actions are planned with stochastic differential dynamic programming [Yamaguchi and Atkeson, 2015]. Especially this paper clarifies why we believe our strategy works in complex robotic tasks such as pouring.

[1]  David J. Reinkensmeyer,et al.  Task-level robot learning , 1988, Proceedings. 1988 IEEE International Conference on Robotics and Automation.

[2]  Christopher G. Atkeson,et al.  Task-level robot learning: juggling a tennis ball more accurately , 1989, Proceedings, 1989 International Conference on Robotics and Automation.

[3]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[4]  Stefan Schaal,et al.  Robot Learning From Demonstration , 1997, ICML.

[5]  George A. Bekey,et al.  On autonomous robots , 1998, The Knowledge Engineering Review.

[6]  Andrew Y. Ng,et al.  Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[7]  Eduardo F. Morales,et al.  Dynamic Reward Shaping: Training a Robot by Voice , 2010, IBERAMIA.

[8]  Guido Sanguinetti,et al.  Advances in Neural Information Processing Systems 24 , 2011 .

[9]  Yuval Tassa,et al.  High-order local dynamic programming , 2011, 2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).

[10]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[11]  Shay B. Cohen,et al.  Advances in Neural Information Processing Systems 25 , 2012, NIPS 2012.

[12]  Jan Peters,et al.  Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[13]  Christian Szegedy,et al.  DeepPose: Human Pose Estimation via Deep Neural Networks , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Luigi Acerbi,et al.  Advances in Neural Information Processing Systems 27 , 2014 .

[16]  Nolan Wagener,et al.  Learning contact-rich manipulation skills with guided policy search , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[17]  Christopher G. Atkeson,et al.  Differential dynamic programming with temporally decomposed dynamics , 2015, 2015 IEEE-RAS 15th International Conference on Humanoid Robots (Humanoids).

[18]  Christopher G. Atkeson,et al.  Neural networks and differential dynamic programming for reinforcement learning problems , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[19]  Abhinav Gupta,et al.  Supersizing self-supervision: Learning to grasp from 50K tries and 700 robot hours , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).