Differential dynamic programming for graph-structured dynamical systems: Generalization of pouring behavior with different skills

We explore differential dynamic programming for dynamical systems that form a directed graph structure. This planning method is applicable to complicated tasks where sub-tasks are sequentially connected and different skills are selected according to the situation. A pouring task is an example: it involves grasping and moving a container, and selection of skills, e.g. tipping and shaking. Our method can handle these situations; we plan the continuous parameters of each subtask and skill, as well as select skills. Our method is based on stochastic differential dynamic programming. We use stochastic neural networks to learn dynamical systems when they are unknown. Our method is a form of reinforcement learning. On the other hand, we use ideas from artificial intelligence, such as graph-structured dynamical systems, and frame-and-slots to represent a large state-action vector. This work is a partial unification of these different fields. We demonstrate our method in a simulated pouring task, where we show that our method generalizes over material property and container shape. Accompanying video: https://youtu.be/_ECmnG2BLE8.

[1]  S. Schaal,et al.  Robot juggling: implementation of memory-based learning , 1994, IEEE Control Systems.

[2]  Stefan Schaal,et al.  Reinforcement learning of motor skills in high dimensions: A path integral approach , 2010, 2010 IEEE International Conference on Robotics and Automation.

[3]  Siddhartha S. Srinivasa,et al.  Pre- and post-contact policy decomposition for planar contact manipulation under uncertainty , 2014, Int. J. Robotics Res..

[4]  C. Atkeson,et al.  Minimax differential dynamic programming: application to a biped walking robot , 2003, Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453).

[5]  Ross A. Knepper,et al.  DeepMPC: Learning Deep Latent Features for Model Predictive Control , 2015, Robotics: Science and Systems.

[6]  C. Atkeson Model-based Reinforcement Learning with Neural Networks on Hierarchical Dynamic System , 2016 .

[7]  Christopher G. Atkeson,et al.  Differential dynamic programming with temporally decomposed dynamics , 2015, 2015 IEEE-RAS 15th International Conference on Humanoid Robots (Humanoids).

[8]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[9]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[10]  Sergey Levine,et al.  Variational Policy Search via Trajectory Optimization , 2013, NIPS.

[11]  Tsukasa Ogasawara,et al.  Pouring Skills with Planning and Learning Modeled from Human Demonstrations , 2015, Int. J. Humanoid Robotics.

[12]  Jan Peters,et al.  Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[13]  Christopher G. Atkeson,et al.  Neural networks and differential dynamic programming for reinforcement learning problems , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[14]  D. Mayne A Second-order Gradient Method for Determining Optimal Trajectories of Non-linear Discrete-time Systems , 1966 .

[15]  David J. Reinkensmeyer,et al.  Task-level robot learning , 1988, Proceedings. 1988 IEEE International Conference on Robotics and Automation.

[16]  Yunpeng Pan,et al.  Probabilistic Differential Dynamic Programming , 2014, NIPS.

[17]  Jan Peters,et al.  Policy Search for Motor Primitives in Robotics , 2008, NIPS 2008.

[18]  Christopher G. Atkeson,et al.  Task-level robot learning: juggling a tennis ball more accurately , 1989, Proceedings, 1989 International Conference on Robotics and Automation.

[19]  Jun Takamatsu,et al.  Inverse Kinematics Solver for Android Faces with Elastic Skin , 2012, ARK.