Learning control of complex skills

This dissertation presents a hierarchical controller which can learn to perform complex motor skills. Humans routinely coordinate many degrees of freedom smoothly and effortlessly to achieve complex goals. Moreover, we are good at learning new patterns of coordination to produce new skills. Robots and artificial systems, on the other hand, typically have difficulty with the kinds of behaviors that come most naturally to us. Skills such as running, skiing, playing basketball, or diving involve complex nonlinear dynamics, many degrees of freedom, and behavioral goals that can be difficult to specify mathematically; goals such as “ski down the mountain without falling down” or “shoot a layup” must be translated from linguistic requirements into dynamic system constraints. The focus in this dissertation will be on the skill of platform diving, in which the diver's goal is to execute a certain dive and enter the water in a fully-extended, vertical position. Controlling a simulated diver is a difficult problem for standard control and planning algorithms; conservation of angular momentum gives the system dynamics a nonholonomic constraint with nonlinear drift. In this dissertation, ideas from the fields of biological motor control and learning are combined with new learning algorithms in the design of a hierarchical controller which learns to dive. At the lower level of the control hierarchy, each degree of freedom in the diver's joints is assigned a controller based on biological pattern generators for fast, single-joint movements. These controllers contain neural networks, which are trained on data generated by simulation. The higher level of the control hierarchy incorporates ideas from human skill learning: to achieve a desired behavior pattern, a human learning a new skill uses information from instructors and from watching other performers to build a mental model of the task requirements, and then practices to refine the parameters of this behavioral model. In the high-level controller, each dive is represented as a sequence of multi-joint synergies. The controller learns initial estimates of the timing of these synergies from observational data and then refines these estimates through Q-learning with repeated simulations.

[1]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[2]  A. Huxley Muscle structure and theories of contraction. , 1957, Progress in biophysics and biophysical chemistry.

[3]  I. G. BONNER CLAPPISON Editor , 1960, The Electric Power Engineering Handbook - Five Volume Set.

[4]  R. Brockett System Theory on Group Manifolds and Coset Spaces , 1972 .

[5]  R. Schmidt A schema theory of discrete motor skill learning. , 1975 .

[6]  S. Grillner Locomotion in vertebrates: central mechanisms and reflex interaction. , 1975, Physiological reviews.

[7]  J. Baillieul Geometric methods for nonlinear optimal control problems , 1978 .

[8]  Antonio Pedotti,et al.  Optimization of muscle-force sequencing in human locomotion , 1978 .

[9]  C. Frohlich Do springboard divers violate angular momentum conservation , 1979 .

[10]  N. A. Bernshteĭn,et al.  Human motor actions : Bernstein reassessed , 1984 .

[11]  A. Grinnell,et al.  THE PHYSIOLOGY OF EXCITABLE CELLS , 1984 .

[12]  C. Atkeson,et al.  Kinematic features of unrestrained vertical arm movements , 1985, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[13]  S Grillner,et al.  Central pattern generators for locomotion, with special reference to vertebrates. , 1985, Annual review of neuroscience.

[14]  L. Stark,et al.  Roles of the elements of the triphasic control signal , 1985, Experimental Neurology.

[15]  Rodney A. Brooks,et al.  A Robust Layered Control Syste For A Mobile Robot , 2022 .

[16]  D. Hoffman,et al.  Step-tracking movements of the wrist in humans. I. Kinematic analysis , 1986, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[17]  Marc H. Raibert,et al.  Legged Robots That Balance , 1986, IEEE Expert.

[18]  Masao Ito Neural systems controlling movement , 1986, Trends in Neurosciences.

[19]  J. Kehne The Neural Basis of Motor Control , 1987, The Yale Journal of Biology and Medicine.

[20]  J. M. Hollerbach,et al.  Inferring limb coordination strategies from trajectory kinematics , 1986, Journal of Neuroscience Methods.

[21]  Roger W. Brockett,et al.  On the computer control of movement , 1988, Proceedings. 1988 IEEE International Conference on Robotics and Automation.

[22]  M. Kawski Nilpotent Lie algebras of vectorfields. , 1988 .

[23]  B. Widrow,et al.  The truck backer-upper: an example of self-learning in neural networks , 1989, International 1989 Joint Conference on Neural Networks.

[24]  S. Sastry,et al.  Adaptive Control: Stability, Convergence and Robustness , 1989 .

[25]  G. Gottlieb,et al.  Strategies for the control of voluntary movements with one mechanical degree of freedom , 1989, Behavioral and Brain Sciences.

[26]  Blake Hannaford,et al.  Time optimality, proprioception, and the triphasic EMG pattern , 1989, Behavioral and Brain Sciences.

[27]  J. Soechting Elements of Coordinated ARM Movements in Three-Dimensional Space§ , 1989 .

[28]  D. Winter Coordination of Motor Tasks in Human Gait , 1989 .

[29]  Gilbert Strang,et al.  Wavelets and Dilation Equations: A Brief Introduction , 1989, SIAM Rev..

[30]  Steven M. Finbeiner,et al.  The Neural and Behavioral Organization of Goal-Directed Movements , 1989, The Yale Journal of Biology and Medicine.

[31]  Rodney A. Brooks,et al.  A robot that walks; emergent behaviors from a carefully evolved network , 1989, Proceedings, 1989 International Conference on Robotics and Automation.

[32]  P. N. Kugler,et al.  Search Strategies and the Acquisition of Coordination , 1989 .

[33]  D. Hoffman,et al.  Step-tracking movements of the wrist in humans. II. EMG analysis , 1990, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[34]  Rodney A. Brooks,et al.  Learning to Coordinate Behaviors , 1990, AAAI.

[35]  Mitsuo Kawato,et al.  Feedback-Error-Learning Neural Network for Supervised Motor Learning , 1990 .

[36]  R. Montgomery Isoholonomic problems and some applications , 1990 .

[37]  P.J. Werbos,et al.  An overview of neural networks for control , 1991, IEEE Control Systems.

[38]  Jessica K. Hodgins,et al.  Animation of dynamic legged locomotion , 1991, SIGGRAPH.

[39]  Rodney A. Brooks,et al.  The role of learning in autonomous robots , 1991, COLT '91.

[40]  Eduardo D. Sontag,et al.  Feedback Stabilization Using Two-Hidden-Layer Nets , 1991, 1991 American Control Conference.

[41]  K. Newell Motor skill acquisition. , 1991, Annual review of psychology.

[42]  Anders Krogh,et al.  Introduction to the theory of neural computation , 1994, The advanced book program.

[43]  B. Bequette Nonlinear control of chemical processes: a review , 1991 .

[44]  B. Abernethy,et al.  Chapter 1 The Rise and Fall of Dominant Paradigms in Motor Behaviour Research , 1992 .

[45]  M. Kawski Combinatorics of Realizations of Nilpotent Control Systems , 1992 .

[46]  Simon F. Giszter,et al.  SPINAL MOVEMENT PRIMITIVES AND MOTOR PROGRAMS : A NECESSARY CONCEPT FOR MOTOR CONTROL , 1992 .

[47]  W.S. Levine,et al.  The neural control of limb movement , 1992, IEEE Control Systems.

[48]  S. Shankar Sastry,et al.  The Structure of Optimal Controls for a Steering Problem , 1992 .

[49]  B. Vereijken,et al.  Free(z)ing Degrees of Freedom in Skill Acquisition , 1992 .

[50]  W. J. Beek,et al.  A Dynamical Systems Approach to Skill Acquisition , 1992, The Quarterly journal of experimental psychology. A, Human experimental psychology.

[51]  Michael J. Grimble,et al.  Iterative Learning Control for Deterministic Systems , 1992 .

[52]  S. Sastry,et al.  Trajectory generation for the N-trailer problem using Goursat normal form , 1993, Proceedings of 32nd IEEE Conference on Decision and Control.

[53]  Roger W. Brockett,et al.  Hybrid Models for Motion Control Systems , 1993 .

[54]  S. Sastry,et al.  Extended Goursat normal forms with applications to nonholonomic motion planning , 1993, Proceedings of 32nd IEEE Conference on Decision and Control.

[55]  I. Stewart,et al.  Coupled nonlinear oscillators and the symmetries of animal gaits , 1993 .

[56]  S. Sastry,et al.  Steering left-invariant control systems on matrix Lie groups , 1993, Proceedings of 32nd IEEE Conference on Decision and Control.

[57]  G. Gottlieb,et al.  A Computational Model of the Simplest Motor Program. , 1993, Journal of motor behavior.

[58]  Shankar Sastry,et al.  Algorithms for steering on the group of rotations , 1993, 1993 American Control Conference.

[59]  Satinder Singh,et al.  Distributed Representation of Limb Motor Programs in Arrays of Adjustable Pattern Generators , 1993, Journal of Cognitive Neuroscience.

[60]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[61]  S. Sastry,et al.  Nonholonomic motion planning: steering using sinusoids , 1993, IEEE Trans. Autom. Control..

[62]  S. Hyakin,et al.  Neural Networks: A Comprehensive Foundation , 1994 .

[63]  F A Mussa-Ivaldi,et al.  Adaptive representation of dynamics during learning of a motor task , 1994, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[64]  Gerald Tesauro,et al.  TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.

[65]  James B. Rawlings,et al.  Nonlinear Model Predictive Control: A Tutorial and Survey , 1994 .

[66]  D. Glencross,et al.  Motor control, motor learning and the acquisition of skill: historical trends and future directions. , 1994 .

[67]  Richard M. Murray,et al.  A Mathematical Introduction to Robotic Manipulation , 1994 .

[68]  D A Hong,et al.  Task dependent patterns of muscle activation at the shoulder and elbow for unconstrained arm movements. , 1994, Journal of neurophysiology.

[69]  John N. Tsitsiklis,et al.  Neuro-dynamic programming: an overview , 1995, Proceedings of 1995 34th IEEE Conference on Decision and Control.

[70]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[71]  Richard S. Sutton,et al.  Computational Schemes and Neural Network Models for Formation and Control of Multijoint Arm Trajectory , 1995 .

[72]  S. Shankar Sastry,et al.  On reorienting linked rigid bodies using internal motions , 1995, IEEE Trans. Robotics Autom..

[73]  Ilya Kolmanovsky,et al.  Developments in nonholonomic control problems , 1995 .

[74]  Andrew G. Barto,et al.  Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..

[75]  Salvatore Monaco,et al.  Digital control through finite feedback discretizability , 1996, Proceedings of IEEE International Conference on Robotics and Automation.

[76]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[77]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[78]  Jessica K. Hodgins,et al.  Three-dimensional human running , 1996, Proceedings of IEEE International Conference on Robotics and Automation.

[79]  Jessica K. Hodgins,et al.  Animation of Human Diving , 1996, Comput. Graph. Forum.

[80]  H. Harry Asada,et al.  Integrated structure/control design of mechatronic systems using a recursive experimental optimization method , 1996 .

[81]  L. Staiger Languages , 1997, Practice and Procedure of the International Criminal Tribunal for the Former Yugoslavia.

[82]  M. Agarwal A systematic classification of neural-network-based control , 1997 .

[83]  Jessica K. Hodgins,et al.  Adapting simulated behaviors for new characters , 1997, SIGGRAPH.

[84]  George A. Bekey,et al.  Learning helicopter control through "teaching by showing" , 1998, Proceedings of the 37th IEEE Conference on Decision and Control (Cat. No.98CH36171).

[85]  Daniel P. Ferris,et al.  10 Biomechanics of Walking and Running: Center of Mass Movements to Muscle Action , 1998, Exercise and sport sciences reviews.

[86]  John Lygeros,et al.  Exterior Differential Systems in Control and Robotics , 1998 .

[87]  David A. Winter Human movement: a system-level approach , 1998 .

[88]  Jitendra Malik,et al.  Tracking people with twists and exponential maps , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[89]  J. Barnden Michael A. Arbib, The metaphorical brain 2: Neural networks and beyond , 1998 .

[90]  S. Iversen Motor control , 2000, Clinical Neurophysiology.