Self-Modeling Neural Systems

Self-Modeling Neural Systems Gregory D. Wayne Goal-directedness is a fundamental property of all living things, but it is perhaps most easily identified in the movement patterns of animals. Ethologists have divided the basic forms of animal behavior into three categories: reproductive, defensive, and ingestive, all of which depend on the complex orchestration of motor control. In this dissertation, we use the framework of optimal control theory to model goal-directed behavior and repurpose it in new ways. We demonstrate a method for creating a hierarchical control network in which higher levels of the control hierarchy deal with tasks of increased abstractness. In a two-level system, the lower-level deals with short time-scale, low-dimensional motor control, and the higher-level is charged with longer time-scale, higher-dimensional planning. Central to our approach to joining the levels is the construction of a forward model of the behavior of the lower-level by the higher-level. Thus, we extend ideas of optimal control theory from controlling a “plant” to controlling a controller. We apply our method to the example problem of guiding a semi-truck in reverse around a field of obstacles. The lower-level controller drives the truck, and the higher-level detects obstacles and plans routes around them. In other work, we consider whether it is possible for a neural system that obeys certain biological constraints to solve optimal control problems. We exhibit a simple method to train a different kind of internal model, a neural network model of the Jacobian of the plant, and we integrate the internal model in a forward-in-time computation that produces an optimal feedback controller. We apply our method to two well-known model problems in optimal control, the torque-limited pendulum and cart-pole swing-up problems.

[1]  Longxin Lin Self-Improving Reactive Agents Based on Reinforcement Learning, Planning and Teaching , 2004, Machine Learning.

[2]  Robert M. Sapolsky,et al.  Why zebras don't get ulcers : an updated guide to stress, stress-related diseases, and coping , 1994 .

[3]  Sebastian Thrun,et al.  A Personal Account of the Development of Stanley, the Robot That Won the DARPA Grand Challenge , 2006, AI Mag..

[4]  Timothy P. Lillicrap,et al.  Sensitivity Derivatives for Flexible Sensorimotor Learning , 2008, Neural Computation.

[5]  Yuval Tassa Fast Model Predictive Control for Reactive Robotic Swimming , 2010 .

[6]  E. Gat On Three-Layer Architectures , 1997 .

[7]  G. Gaál Relationship of calculating the Jacobian matrices of nonlinear systems and population coding algorithms in neurobiology , 1995 .

[8]  Chun-Ta Chen,et al.  A reflexive vehicle control architecture based on a neural model of the cockroach escape response , 2012, J. Syst. Control. Eng..

[9]  Y. Sugita Global plasticity in adult visual cortex following reversal of visual input , 1996, Nature.

[10]  S. Haykin,et al.  Adaptive Filter Theory , 1986 .

[11]  Ronald J. Williams,et al.  A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.

[12]  F. Attneave Some informational aspects of visual perception. , 1954, Psychological review.

[13]  L. Mcbride,et al.  Optimization of time-varying systems , 1965 .

[14]  Francis Crick,et al.  The recent excitement about neural networks , 1989, Nature.

[15]  H. B. Barlow,et al.  Possible Principles Underlying the Transformations of Sensory Messages , 2012 .

[16]  Stefan Schaal,et al.  Reinforcement learning of motor skills in high dimensions: A path integral approach , 2010, 2010 IEEE International Conference on Robotics and Automation.

[17]  Michael I. Jordan,et al.  Forward Models: Supervised Learning with a Distal Teacher , 1992, Cogn. Sci..

[18]  Geoffrey E. Hinton,et al.  Learning Representations by Recirculation , 1987, NIPS.

[19]  R. Bellman Dynamic programming. , 1957, Science.

[20]  Geoffrey E. Hinton,et al.  Training Recurrent Neural Networks , 2013 .

[21]  D. Wolpert,et al.  Is the cerebellum a smith predictor? , 1993, Journal of motor behavior.

[22]  Christopher G. Atkeson,et al.  Efficient robust policy optimization , 2012, 2012 American Control Conference (ACC).

[23]  David Sussillo,et al.  Opening the Black Box: Low-Dimensional Dynamics in High-Dimensional Recurrent Neural Networks , 2013, Neural Computation.

[24]  Peter Dayan,et al.  Goal-directed control and its antipodes , 2009, Neural Networks.

[25]  Emanuel Todorov,et al.  Revision of JN-RM-3106-07 Recurrent neural networks trained in the presence of noise give rise to mixed muscle-movement representations , 2008 .

[26]  Michael Mandelstam,et al.  On the Bandwagon? , 2007 .

[27]  Shun-ichi Amari,et al.  Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.

[28]  R. J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[29]  D. Wolpert,et al.  Principles of sensorimotor learning , 2011, Nature Reviews Neuroscience.

[30]  Rodney A. Brooks,et al.  A Robust Layered Control Syste For A Mobile Robot , 2022 .

[31]  Yann LeCun,et al.  A theoretical framework for back-propagation , 1988 .

[32]  M. Kawato,et al.  A hierarchical neural-network model for control and learning of voluntary movement , 2004, Biological Cybernetics.

[33]  David J. Fleet,et al.  Optimizing walking controllers , 2009, ACM Trans. Graph..

[34]  Donald A. Sofge,et al.  Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches , 1992 .

[35]  Zoran Popovic,et al.  Discovery of complex behaviors through contact-invariant optimization , 2012, ACM Trans. Graph..

[36]  A. Newell Unified Theories of Cognition , 1990 .

[37]  T. Jessell,et al.  Clarke's Column Neurons as the Focus of a Corticospinal Corollary Circuit , 2010, Nature Neuroscience.

[38]  Bernard Widrow,et al.  Punish/Reward: Learning with a Critic in Adaptive Threshold Systems , 1973, IEEE Trans. Syst. Man Cybern..

[39]  E. Todorov Optimality principles in sensorimotor control , 2004, Nature Neuroscience.

[40]  John E. Hershey,et al.  Computation , 1991, Digit. Signal Process..

[41]  Razvan V. Florian,et al.  Correct equations for the dynamics of the cart-pole system , 2005 .

[42]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[43]  Y. Nesterov A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .

[44]  J. Krakauer,et al.  Error correction, sensory prediction, and adaptation in motor control. , 2010, Annual review of neuroscience.

[45]  J. Krakauer,et al.  An Implicit Plan Overrides an Explicit Strategy during Visuomotor Adaptation , 2006, The Journal of Neuroscience.

[46]  Sebastian Thrun,et al.  Stanley: The robot that won the DARPA Grand Challenge , 2006, J. Field Robotics.

[47]  Norbert Wiener,et al.  Cybernetics, Second Edition: or the Control and Communication in the Animal and the Machine , 1965 .

[48]  Emanuel Todorov,et al.  Real-time motor control using recurrent neural networks , 2009, 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning.

[49]  G. Stratton Some preliminary experiments on vision without inversion of the retinal image. , 1896 .

[50]  Pawel Wawrzynski,et al.  Real-time reinforcement learning by sequential Actor-Critics and experience replay , 2009, Neural Networks.

[51]  Kenji Doya,et al.  Reinforcement Learning in Continuous Time and Space , 2000, Neural Computation.

[52]  Gerhardt von Bonin,et al.  Cybernetics or control and communication in the animal and the machine: Norbert wiener, 1948. 194 pp. New York: John Wiley & Sons, Inc. Paris: Hermann et cie , 1949 .

[53]  Michael I. Jordan,et al.  Optimal feedback control as a theory of motor coordination , 2002, Nature Neuroscience.

[54]  Andrew Y. Ng,et al.  Policy search via the signed derivative , 2009, Robotics: Science and Systems.

[55]  Russ Tedrake,et al.  LQR-trees: Feedback motion planning on sparse randomized trees , 2009, Robotics: Science and Systems.

[56]  B. Widrow,et al.  The truck backer-upper: an example of self-learning in neural networks , 1989, International 1989 Joint Conference on Neural Networks.

[57]  Shalabh Bhatnagar,et al.  Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation , 2009, NIPS.

[58]  Robert F. Stengel,et al.  Optimal Control and Estimation , 1994 .

[59]  T. Flash,et al.  The coordination of arm movements: an experimentally confirmed mathematical model , 1985, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[60]  Marc'Aurelio Ranzato,et al.  Fast Inference in Sparse Coding Algorithms with Applications to Object Recognition , 2010, ArXiv.

[61]  C. Brinkman,et al.  Plasticity of motor behavior in monkeys with crossed forelimb nerves. , 1983, Science.

[62]  K. Lashley Basic neural mechanisms in behavior. , 1930 .