Chained learning architectures in a simple closed-loop behavioural context

ObjectiveLiving creatures can learn or improve their behaviour by temporally correlating sensor cues where near-senses (e.g., touch, taste) follow after far-senses (vision, smell). Such type of learning is related to classical and/or operant conditioning. Algorithmically all these approaches are very simple and consist of single learning unit. The current study is trying to solve this problem focusing on chained learning architectures in a simple closed-loop behavioural context.MethodsWe applied temporal sequence learning (Porr B and Wörgötter F 2006) in a closed-loop behavioural system where a driving robot learns to follow a line. Here for the first time we introduced two types of chained learning architectures named linear chain and honeycomb chain. We analyzed such architectures in an open and closed-loop context and compared them to the simple learning unit.ConclusionsBy implementing two types of simple chained learning architectures we have demonstrated that stable behaviour can also be obtained in such architectures. Results also suggest that chained architectures can be employed and better behavioural performance can be obtained compared to simple architectures in cases where we have sparse inputs in time and learning normally fails because of weak correlations.

[1]  Norbert Wiener,et al.  Cybernetics: Control and Communication in the Animal and the Machine. , 1949 .

[2]  W. Walter An Imitation of Life , 1950 .

[3]  Norbert Wiener,et al.  Cybernetics, or control and communication in the animal and the machine, 2nd ed. , 1961 .

[4]  Viktor Mikhaĭlovich Glushkov,et al.  An Introduction to Cybernetics , 1957, The Mathematical Gazette.

[5]  D. McFarland Feedback mechanisms in animal behaviour , 1971 .

[6]  Ian H. Witten,et al.  An Adaptive Optimal Controller for Discrete-Time Markov Environments , 1977, Inf. Control..

[7]  A G Barto,et al.  Toward a modern theory of adaptive networks: expectation and prediction. , 1981, Psychological review.

[8]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[9]  V. Braitenberg Vehicles, Experiments in Synthetic Psychology , 1984 .

[10]  B. Kosco Differential Hebbian learning , 1987 .

[11]  A. Klopf A neuronal model of classical conditioning , 1988 .

[12]  C. Watkins Learning from delayed rewards , 1989 .

[13]  Richard S. Sutton,et al.  Time-Derivative Models of Pavlovian Reinforcement , 1990 .

[14]  M. Gabriel,et al.  Learning and Computational Neuroscience: Foundations of Adaptive Networks , 1990 .

[15]  Geert Grooteplein Noord Adaptive Fields: Distributed Representations of Classically Conditioned Associations , 1991 .

[16]  Mitsuo Kawato,et al.  Neural network control for a closed-loop System using Feedback-error-learning , 1993, Neural Networks.

[17]  Michael A. Arbib,et al.  The handbook of brain theory and neural networks , 1995, A Bradford book.

[18]  Peter Dayan,et al.  Bee foraging in uncertain environments using predictive hebbian learning , 1995, Nature.

[19]  S. Nayar,et al.  Early Visual Learning , 1996 .

[20]  Dean A. Pomerleau,et al.  Neural Network Vision for Robot Driving , 1997 .

[21]  W. Schultz,et al.  Learning of sequential movements by neural network model with dopamine-like reinforcement signal , 1998, Experimental Brain Research.

[22]  Andrew G. Barto,et al.  Reinforcement learning in motor control , 1998 .

[23]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[24]  A. Kelley Functional Specificity of Ventral Striatal Compartments in Appetitive Behaviors , 1999, Annals of the New York Academy of Sciences.

[25]  E. Kandel,et al.  Is Heterosynaptic modulation essential for stabilizing hebbian plasiticity and memory , 2000, Nature Reviews Neuroscience.

[26]  M. Davis,et al.  Using pavlovian higher-order conditioning paradigms to investigate the neural substrates of emotional learning and memory. , 2000, Learning & memory.

[27]  Michael F. Land Does Steering a Car Involve Perception of the Velocity Flow Field , 2001 .

[28]  Roland E. Suri,et al.  Temporal Difference Model Reproduces Anticipatory Neural Activity , 2001, Neural Computation.

[29]  Isaac Meilijson,et al.  Evolution of Reinforcement Learning in Uncertain Environments: A Simple Explanation for Complex Foraging Behaviors , 2002, Adapt. Behav..

[30]  Y. Niv,et al.  Evolution of Reinforcement Learning in Uncertain Environments: A Simple Explanation for Complex Foraging Behaviors , 2002 .

[31]  Barbara Webb,et al.  Robots in invertebrate neuroscience , 2002, Nature.

[32]  Florentin Wörgötter,et al.  Isotropic-sequence-order learning in a closed-loop behavioural system , 2003, Philosophical Transactions of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences.

[33]  Florentin Wörgötter,et al.  Isotropic Sequence Order Learning , 2003, Neural Computation.

[34]  M. Tsukamoto,et al.  Mossy fibre synaptic NMDA receptors trigger non‐hebbian long‐term potentiation at entorhino‐CA3 synapses in the rat , 2003, The Journal of physiology.

[35]  Florentin Wörgötter,et al.  ISO Learning Approximates a Solution to the Inverse-Controller Problem in an Unsupervised Behavioral Paradigm , 2003, Neural Computation.

[36]  Y. Humeau,et al.  Presynaptic induction of heterosynaptic associative plasticity in the mammalian brain , 2003, Nature.

[37]  T. Jay Dopamine: a potential substrate for synaptic plasticity and memory mechanisms , 2003, Progress in Neurobiology.

[38]  Paul F. M. J. Verschure,et al.  A real-world rational agent: unifying old and new AI , 2003, Cogn. Sci..

[39]  H. Ikeda,et al.  Role of AMPA and NMDA receptors in the nucleus accumbens shell in turning behaviour of rats: interaction with dopamine receptors , 2003, Neuropharmacology.

[40]  P. König,et al.  Involving the motor system in decision making , 2004, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[41]  Jun Nakanishi,et al.  Feedback error learning and nonlinear adaptive control , 2004, Neural Networks.

[42]  Alejandro Agostini,et al.  Trajectory tracking control of a rotational joint using feature-based categorization learning , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[43]  Peter Dayan,et al.  Technical Note: Q-Learning , 2004, Machine Learning.

[44]  Justus H. Piater,et al.  Task-Driven Learning of Spatial Combinations of Visual Features , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Workshops.

[45]  Florentin Wörgötter,et al.  Temporal Sequence Learning, Prediction, and Control: A Review of Different Models and Their Relation to Biological Mechanisms , 2005, Neural Computation.

[46]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[47]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[48]  Philipp Slusallek,et al.  Introduction to real-time ray tracing , 2005, SIGGRAPH Courses.

[49]  E. Jara,et al.  Second-order conditioning of human causal learning , 2006 .

[50]  Florentin Wörgötter,et al.  Strongly Improved Stability and Faster Convergence of Temporal Sequence Learning by Using Input Correlations Only , 2006, Neural Computation.

[51]  Gerald M Edelman,et al.  A cerebellar model for predictive motor control tested in a brain-based device. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[52]  Florentin Wörgötter,et al.  Adaptive, Fast Walking in a Biped Robot under Neuronal Control and Learning , 2007, PLoS Comput. Biol..

[53]  Florentin Wörgötter,et al.  Mathematical properties of neuronal TD-rules and differential Hebbian learning: a comparison , 2008, Biological Cybernetics.

[54]  B. Kosko Differential Hebbian learning , 2008 .