Operant conditioning: a minimal components requirement in artificial spiking neurons designed for bio-inspired robot's controller

In this paper, we investigate the operant conditioning (OC) learning process within a bio-inspired paradigm, using artificial spiking neural networks (ASNN) to act as robot brain controllers. In biological agents, OC results in behavioral changes learned from the consequences of previous actions, based on progressive prediction adjustment from rewarding or punishing signals. In a neurorobotics context, virtual and physical autonomous robots may benefit from a similar learning skill when facing unknown and unsupervised environments. In this work, we demonstrate that a simple invariant micro-circuit can sustain OC in multiple learning scenarios. The motivation for this new OC implementation model stems from the relatively complex alternatives that have been described in the computational literature and recent advances in neurobiology. Our elementary kernel includes only a few crucial neurons, synaptic links and originally from the integration of habituation and spike-timing dependent plasticity as learning rules. Using several tasks of incremental complexity, our results show that a minimal neural component set is sufficient to realize many OC procedures. Hence, with the proposed OC module, designing learning tasks with an ASNN and a bio-inspired robot context leads to simpler neural architectures for achieving complex behaviors.

[1]  J. Simmers,et al.  Implication of dopaminergic modulation in operant reward learning and the induction of compulsive-like feeding behavior in Aplysia. , 2013, Learning & memory.

[2]  B. Brembs,et al.  Double Dissociation of PKC and AC Manipulations on Operant and Classical Learning in Drosophila , 2008, Current Biology.

[3]  Waleed Nazih,et al.  Studying a Chaotic Spiking Neural Model , 2013, ArXiv.

[4]  George A. Bekey,et al.  AUTONOMOUS ROBOTS, From Biological Inspiration to Implementation and Control, by G.A. Bekey, MIT Press, 2005, xv + 577 pp., index, ISBN 0-262-02578-7, 25 pages of references (Hb. £35.95) , 2005, Robotica.

[5]  Wang Banyue,et al.  Chapter 5 , 2003 .

[6]  R. F. Thompson,et al.  Habituation: a model phenomenon for the study of neuronal substrates of behavior. , 1966, Psychological review.

[7]  E. Fischer Conditioned Reflexes , 1942, American journal of physical medicine.

[8]  Wulfram Gerstner,et al.  A History of Spike-Timing-Dependent Plasticity , 2011, Front. Syn. Neurosci..

[9]  R. Hawkins A cellular mechanism of classical conditioning in Aplysia. , 1984, The Journal of experimental biology.

[10]  Tony R. Martinez,et al.  The general inefficiency of batch training for gradient descent learning , 2003, Neural Networks.

[11]  Eduardo Ros,et al.  Event-Driven Simulation Scheme for Spiking Neural Networks Using Lookup Tables to Characterize Neuronal Dynamics , 2006, Neural Computation.

[12]  Filip Ponulak,et al.  Introduction to spiking neural networks: Information processing, learning and applications. , 2011, Acta neurobiologiae experimentalis.

[13]  C. Stevens,et al.  Aquaporin 4 and glymphatic flow have central roles in brain fluid homeostasis , 2021, Nature Reviews Neuroscience.

[14]  Jochen J. Steil,et al.  Rare Neural Correlations Implement Robotic Conditioning with Delayed Rewards and Disturbances , 2013, Front. Neurorobot..

[15]  B. Brembs Operant conditioning in invertebrates , 2003, Current Opinion in Neurobiology.

[16]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[17]  Jonathan T. Pierce-Shimomura,et al.  Conserved role of dopamine in the modulation of behavior , 2012, Communicative & integrative biology.

[18]  E. Izhikevich Solving the distal reward problem through linkage of STDP and dopamine signaling , 2007, BMC Neuroscience.

[19]  André Cyr,et al.  Habituation: a non-associative learning rule design for spiking neurons and an autonomous mobile robots implementation , 2013, Bioinspiration & biomimetics.

[20]  J. Simmers,et al.  Neural mechanisms of operant conditioning and learning-induced behavioral plasticity in Aplysia , 2011, Cellular and Molecular Life Sciences.

[21]  Marco Mirolli,et al.  Which is the best intrinsic motivation signal for learning multiple skills? , 2013, Front. Neurorobot..

[22]  R. Portugues,et al.  Ontogeny of classical and operant learning behaviors in zebrafish. , 2012, Learning & memory.

[23]  Fumiya Iida,et al.  The challenges ahead for bio-inspired 'soft' robotics , 2012, CACM.

[24]  Pierre Poirier,et al.  AI-SIMCOG: a simulator for spiking neurons and multiple animats’ behaviours , 2009, Neural Computing and Applications.

[25]  D. A. Baxter,et al.  Molecular Mechanisms Underlying a Cellular Analog of Operant Reward Learning , 2008, Neuron.

[26]  Eugene M. Izhikevich,et al.  Simple model of spiking neurons , 2003, IEEE Trans. Neural Networks.

[27]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[28]  Jerome L. Frieman Learning and Adaptive Behavior , 2001 .

[29]  Wolfgang Maass,et al.  Networks of Spiking Neurons: The Third Generation of Neural Network Models , 1996, Electron. Colloquium Comput. Complex..

[30]  Jeffrey L. Krichmar,et al.  Value and reward based learning in neurorobots , 2013, Front. Neurorobot..

[31]  B. Skinner,et al.  The Behavior of Organisms: An Experimental Analysis , 2016 .

[32]  M. Bear,et al.  A Cholinergic Mechanism for Reward Timing within Primary Visual Cortex , 2013, Neuron.

[33]  M. Giurfa 12 Invertebrate Cognition: Nonelemental Learning beyond Simple Conditioning , 2007 .

[34]  David L. Glanzman,et al.  The cellular basis of classical conditioning in Aplysia californica — it's less simple than you think , 1995, Trends in Neurosciences.

[35]  D. A. Baxter,et al.  Classical and operant conditioning differentially modify the intrinsic properties of an identified neuron , 2006, Nature Neuroscience.

[36]  Wofgang Maas,et al.  Networks of spiking neurons: the third generation of neural network models , 1997 .

[37]  P. Holland Cognitive aspects of classical conditioning , 1993, Current Opinion in Neurobiology.

[38]  R. Kempter,et al.  Synaptic tagging, evaluation of memories, and the distal reward problem. , 2010, Learning & memory.

[39]  J. Tsien,et al.  NMDA Receptors in Dopaminergic Neurons Are Crucial for Habit Learning , 2011, Neuron.

[40]  Cari B. Cannon,et al.  Sensitization–habituation may occur during operant conditioning. , 1996 .

[41]  W. Brown Animal Intelligence: Experimental Studies , 1912, Nature.

[42]  J. Graham,et al.  Sniffy, the virtual rat: Simulated operant conditioning , 1994 .

[43]  Anthony Kulis,et al.  Bio-Inspired Artificial Intelligence: Theories, Methods, and Technologies , 2009, Scalable Comput. Pract. Exp..

[44]  Paolo Gaudiano,et al.  Adaptive obstacle avoidance with a neural network for operant conditioning: experiments with real robots , 1997, Proceedings 1997 IEEE International Symposium on Computational Intelligence in Robotics and Automation CIRA'97. 'Towards New Computational Principles for Robotics and Automation'.

[45]  Luigi Fortuna,et al.  Learning Anticipation via Spiking Networks: Application to Navigation Control , 2009, IEEE Transactions on Neural Networks.

[46]  Wulfram Gerstner,et al.  SPIKING NEURON MODELS Single Neurons , Populations , Plasticity , 2002 .

[47]  David S. Touretzky,et al.  Operant Conditioning in Skinnerbots , 1997, Adapt. Behav..

[48]  D. A. Baxter,et al.  Operant Reward Learning in Aplysia: Neuronal Correlates and Mechanisms , 2002, Science.

[49]  J. Staddon,et al.  The dynamics of operant conditioning. , 1999, Psychological review.

[50]  J. Qin,et al.  Maze exploration and learning in C. elegans. , 2007, Lab on a chip.

[51]  Stephen R. Marsland,et al.  On-line novelty detection for autonomous mobile robots , 2005, Robotics Auton. Syst..

[52]  Mounir Boukadoum,et al.  Classical conditioning in different temporal constraints: an STDP learning rule for robots controlled by spiking neural networks , 2012, Adapt. Behav..

[53]  B. Brembs Spontaneous decisions and operant conditioning in fruit flies , 2011, Behavioural Processes.

[54]  G. Bi,et al.  Synaptic Modifications in Cultured Hippocampal Neurons: Dependence on Spike Timing, Synaptic Strength, and Postsynaptic Cell Type , 1998, The Journal of Neuroscience.

[55]  Donald A. Wilson,et al.  Habituation revisited: An updated and revised description of the behavioral characteristics of habituation , 2009, Neurobiology of Learning and Memory.

[56]  D. A. Baxter,et al.  Feeding behavior of Aplysia: a model system for comparing cellular mechanisms of classical and operant conditioning. , 2006, Learning & memory.