Reinforcement active learning in the vibrissae system: Optimal object localization

Rats move their whiskers to acquire information about their environment. It has been observed that they palpate novel objects and objects they are required to localize in space. We analyze whisker-based object localization using two complementary paradigms, namely, active learning and intrinsic-reward reinforcement learning. Active learning algorithms select the next training samples according to the hypothesized solution in order to better discriminate between correct and incorrect labels. Intrinsic-reward reinforcement learning uses prediction errors as the reward to an actor-critic design, such that behavior converges to the one that optimizes the learning process. We show that in the context of object localization, the two paradigms result in palpation whisking as their respective optimal solution. These results suggest that rats may employ principles of active learning and/or intrinsic reward in tactile exploration and can guide future research to seek the underlying neuronal mechanisms that implement them. Furthermore, these paradigms are easily transferable to biomimetic whisker-based artificial sensors and can improve the active exploration of their environment.

[1]  C. Koch,et al.  A saliency-based search mechanism for overt and covert shifts of visual attention , 2000, Vision Research.

[2]  M. Brainard,et al.  Performance variability enables adaptive plasticity of ‘crystallized’ adult birdsong , 2007, Nature.

[3]  E Ahissar,et al.  Possible involvement of neuromodulatory systems in cortical Hebbian-like plasticity , 1996, Journal of Physiology-Paris.

[4]  Mathew H. Evans,et al.  Tactile Discrimination Using Active Whisker Sensors , 2012, IEEE Sensors Journal.

[5]  T. Prescott,et al.  Active touch sensing in the rat: anticipatory and regulatory control of whisker movements during surface exploration. , 2009, Journal of neurophysiology.

[6]  Per Magne Knutsen,et al.  Motor–sensory convergence in object localization: a comparative study in rats and humans , 2011, Philosophical Transactions of the Royal Society B: Biological Sciences.

[7]  Merav Ahissar,et al.  Hebbian-like functional plasticity in the auditory cortex of the behaving monkey , 1998, Neuropharmacology.

[8]  Minija Tamosiunaite,et al.  On the Asymptotic Equivalence Between Differential Hebbian and Temporal Difference Learning , 2008, Neural Computation.

[9]  Karl J. Friston,et al.  The neurotransmitter basis of cognition: psychopharmacological activation studies using positron emission tomography. , 1991, Ciba Foundation symposium.

[10]  J. Krakauer,et al.  A computational neuroanatomy for motor control , 2008, Experimental Brain Research.

[11]  T. Robbins,et al.  Dissociable Deficits in the Decision-Making Cognition of Chronic Amphetamine Abusers, Opiate Abusers, Patients with Focal Damage to Prefrontal Cortex, and Tryptophan-Depleted Normal Volunteers: Evidence for Monoaminergic Mechanisms , 1999, Neuropsychopharmacology.

[12]  Sanjoy Dasgupta,et al.  Hierarchical sampling for active learning , 2008, ICML '08.

[13]  Shalabh Bhatnagar,et al.  Incremental Natural Actor-Critic Algorithms , 2007, NIPS.

[14]  Shalabh Bhatnagar,et al.  Incremental natural-gradient actor-critic algorithms , 2007 .

[15]  David Kleinfeld,et al.  Coding of stimulus frequency by latency in thalamic networks through the interplay of GABAB-mediated feedback and stimulus shape. , 2006, Journal of neurophysiology.

[16]  Y. Dan,et al.  Spike timing-dependent plasticity: from synapse to perception. , 2006, Physiological reviews.

[17]  Omri Harish,et al.  Control of the firing patterns of vibrissa motoneurons by modulatory and phasic synaptic inputs: a modeling study. , 2010, Journal of neurophysiology.

[18]  Nathan F. Lepora,et al.  Whisker-object contact speed affects radial distance estimation , 2010, 2010 IEEE International Conference on Robotics and Biomimetics.

[19]  Stewart W. Wilson,et al.  A Possibility for Implementing Curiosity and Boredom in Model-Building Neural Controllers , 1991 .

[20]  E. Ahissar,et al.  Vibrissal Kinematics in 3D: Tight Coupling of Azimuth, Elevation, and Torsion across Different Whisking Modes , 2008, Neuron.

[21]  Tao Xiong,et al.  A combined SVM and LDA approach for classification , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[22]  Eilon Vaadia,et al.  Neural basis of sensorimotor learning: modifying internal models , 2008, Current Opinion in Neurobiology.

[23]  Per Magne Knutsen,et al.  Object localization with whiskers , 2008, Biological Cybernetics.

[24]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[25]  Nuttapong Chentanez,et al.  Intrinsically Motivated Learning of Hierarchical Collections of Skills , 2004 .

[26]  Adam Tauman Kalai,et al.  Analysis of Perceptron-Based Active Learning , 2009, COLT.

[27]  Allison J. Doupe,et al.  Neurons in a Forebrain Nucleus Required for Vocal Plasticity Rapidly Switch between Precise Firing and Variable Bursting Depending on Social Context , 2008, The Journal of Neuroscience.

[28]  Ehud Ahissar,et al.  Hierarchical curiosity loops and active sensing , 2012, Neural Networks.

[29]  W. Schultz,et al.  Dopamine signals for reward value and risk: basic and recent data , 2010, Behavioral and Brain Functions.

[30]  Masaki Ogino,et al.  Cognitive Developmental Robotics: A Survey , 2009, IEEE Transactions on Autonomous Mental Development.

[31]  Michale S Fee,et al.  A basal ganglia-forebrain circuit in the songbird biases motor output to avoid vocal errors , 2009, Proceedings of the National Academy of Sciences.

[32]  Juyang Weng,et al.  Developmental Robotics: Theory and Experiments , 2004, Int. J. Humanoid Robotics.

[33]  E. Ahissar,et al.  Temporal and Spatial Characteristics of Vibrissa Responses to Motor Commands , 2010, The Journal of Neuroscience.

[34]  D. Kleinfeld,et al.  Positive Feedback in a Brainstem Tactile Sensorimotor Loop , 2005, Neuron.

[35]  Ying Li,et al.  Serotonin Regulates Rhythmic Whisking , 2003, Neuron.

[36]  E. Ahissar,et al.  Neural signature of taste familiarity in the gustatory cortex of the freely behaving rat. , 2004, Journal of neurophysiology.

[37]  H. Philip Zeigler,et al.  Whisker Deafferentation and Rodent Whisking Patterns: Behavioral Evidence for a Central Pattern Generator , 2001, The Journal of Neuroscience.

[38]  Per Magne Knutsen,et al.  Orthogonal coding of object location , 2009, Trends in Neurosciences.

[39]  George W. Irwin,et al.  Reinforcement Learning for Online Control and Optimisation , 2005 .

[40]  Kenji Doya,et al.  Reinforcement learning: Computational theory and biological mechanisms , 2007, HFSP journal.

[41]  Michael I. Jordan,et al.  Forward Models: Supervised Learning with a Distal Teacher , 1992, Cogn. Sci..

[42]  Joseph H. Solomon,et al.  Biomechanical models for radial distance determination by the rat vibrissal system. , 2007, Journal of neurophysiology.

[43]  Ben Mitchinson,et al.  Feedback control in active sensing: rat exploratory whisking is modulated by environmental contact , 2007, Proceedings of the Royal Society B: Biological Sciences.

[44]  Jürgen Schmidhuber,et al.  Formal Theory of Creativity, Fun, and Intrinsic Motivation (1990–2010) , 2010, IEEE Transactions on Autonomous Mental Development.

[45]  E. Ahissar,et al.  Acetylcholine-dependent induction and expression of functional plasticity in the barrel cortex of the adult rat. , 2001, Journal of neurophysiology.

[46]  Michael S. Brainard,et al.  Central Contributions to Acoustic Variation in Birdsong , 2008, The Journal of Neuroscience.

[47]  P. J. Sjöström,et al.  Neocortical LTD via Coincident Activation of Presynaptic NMDA and Cannabinoid Receptors , 2003, Neuron.

[48]  T. Prescott,et al.  The development of whisker control in rats in relation to locomotion. , 2012, Developmental psychobiology.

[49]  Pierre-Yves Oudeyer,et al.  Intrinsic Motivation Systems for Autonomous Mental Development , 2007, IEEE Transactions on Evolutionary Computation.

[50]  Joseph H. Solomon,et al.  Biomechanics: Robotic whiskers used to sense features , 2006, Nature.

[51]  Rune W. Berg,et al.  Rhythmic whisking by rat: retraction as well as protraction of the vibrissae is under active muscular control. , 2003, Journal of neurophysiology.

[52]  Ehud Ahissar,et al.  Reinforcement active learning hierarchical loops , 2011, The 2011 International Joint Conference on Neural Networks.