Hierarchical Control Using Networks Trained with Higher-Level Forward Models

We propose and develop a hierarchical approach to network control of complex tasks. In this approach, a low-level controller directs the activity of a “plant,” the system that performs the task. However, the low-level controller may be able to solve only fairly simple problems involving the plant. To accomplish more complex tasks, we introduce a higher-level controller that controls the lower-level controller. We use this system to direct an articulated truck to a specified location through an environment filled with static or moving obstacles. The final system consists of networks that have memorized associations between the sensory data they receive and the commands they issue. These networks are trained on a set of optimal associations generated by minimizing cost functions. Cost function minimization requires predicting the consequences of sequences of commands, which is achieved by constructing forward models, including a model of the lower-level controller. The forward models and cost minimization are used only during training, allowing the trained networks to respond rapidly. In general, the hierarchical approach can be extended to larger numbers of levels, dividing complex tasks into more manageable subtasks. The optimization procedure and the construction of the forward models and controllers can be performed in similar ways at each level of the hierarchy, which allows the system to be modified to perform other tasks or to be extended for more complex tasks without retraining lower-levels.

[1]  Jun Nakanishi,et al.  Dynamical Movement Primitives: Learning Attractor Models for Motor Behaviors , 2013, Neural Computation.

[2]  Siddhartha S. Srinivasa,et al.  Inverse Optimal Heuristic Control for Imitation Learning , 2009, AISTATS.

[3]  G. E. Loeb,et al.  A hierarchical foundation for models of sensorimotor control , 1999, Experimental Brain Research.

[4]  Sebastian Thrun,et al.  Stanley: The robot that won the DARPA Grand Challenge , 2006, J. Field Robotics.

[5]  Peter Dayan,et al.  Goal-directed control and its antipodes , 2009, Neural Networks.

[6]  Robert F. Stengel,et al.  Optimal Control and Estimation , 1994 .

[7]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[8]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[9]  Sebastian Thrun,et al.  Stanley: The robot that won the DARPA Grand Challenge: Research Articles , 2006 .

[10]  D. Wolpert,et al.  Is the cerebellum a smith predictor? , 1993, Journal of motor behavior.

[11]  MahadevanSridhar,et al.  Recent Advances in Hierarchical Reinforcement Learning , 2003 .

[12]  E. Gat On Three-Layer Architectures , 1998 .

[13]  H. Sebastian Seung,et al.  Stochastic policy gradient reinforcement learning on a simple 3D biped , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[14]  William D. Smart,et al.  Optimal control for autonomous motor behavior , 2011 .

[15]  Michael I. Jordan,et al.  Forward Models: Supervised Learning with a Distal Teacher , 1992, Cogn. Sci..

[16]  Geoffrey E. Hinton,et al.  Training Recurrent Neural Networks , 2013 .

[17]  Jean-Paul Laumond,et al.  From human to humanoid locomotion—an inverse optimal control approach , 2010, Auton. Robots.

[18]  Stefan Schaal,et al.  Learning objective functions for manipulation , 2013, 2013 IEEE International Conference on Robotics and Automation.

[19]  J. Krakauer,et al.  Error correction, sensory prediction, and adaptation in motor control. , 2010, Annual review of neuroscience.

[20]  Sridhar Mahadevan,et al.  Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[21]  Michael Ulbrich,et al.  A bilevel optimization approach to obtain optimal cost functions for human arm movements , 2012 .

[22]  Yuval Tassa,et al.  Infinite-Horizon Model Predictive Control for Periodic Tasks with Contacts , 2011, Robotics: Science and Systems.

[23]  T. Jessell,et al.  Clarke's Column Neurons as the Focus of a Corticospinal Corollary Circuit , 2010, Nature Neuroscience.

[24]  M. Raibert Motor Control and Learning by the State Space Model , 1977 .

[25]  A. C. Yu,et al.  Temporal Hierarchical Control of Singing in Birds , 1996, Science.

[26]  Michael I. Jordan,et al.  A Model of the Learning of Arm Trajectories from Spatial Deviations , 1994, Journal of Cognitive Neuroscience.

[27]  Carl E. Rasmussen,et al.  PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[28]  Yann LeCun,et al.  A theoretical framework for back-propagation , 1988 .

[29]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[30]  Emanuel Todorov,et al.  Real-time motor control using recurrent neural networks , 2009, 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning.

[31]  K. Lashley Basic neural mechanisms in behavior. , 1930 .

[32]  Tao Xiong,et al.  A combined SVM and LDA approach for classification , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[33]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[34]  Bernard Widrow,et al.  Punish/Reward: Learning with a Critic in Adaptive Threshold Systems , 1973, IEEE Trans. Syst. Man Cybern..

[35]  Yuval Tassa Fast Model Predictive Control for Reactive Robotic Swimming , 2010 .

[36]  J. Foley The co-ordination and regulation of movements , 1968 .

[37]  Dan Liu,et al.  Hierarchical optimal control of a 7-DOF arm model , 2009, 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning.

[38]  B. Widrow,et al.  The truck backer-upper: an example of self-learning in neural networks , 1989, International 1989 Joint Conference on Neural Networks.

[39]  Thomas G. Dietterich The MAXQ Method for Hierarchical Reinforcement Learning , 1998, ICML.

[40]  Marc'Aurelio Ranzato,et al.  Fast Inference in Sparse Coding Algorithms with Applications to Object Recognition , 2010, ArXiv.

[41]  Jun Nakanishi,et al.  Learning Movement Primitives , 2005, ISRR.

[42]  Jean-Claude Latombe,et al.  Landmark-Based Robot Navigation , 1992, Algorithmica.

[43]  B. Hochner,et al.  Control of Octopus Arm Extension by a Peripheral Motor Program , 2001, Science.