Goal-directed feature learning

Only a subset of available sensory information is useful for decision making. Classical models of the brain's sensory system, such as generative models, consider all elements of the sensory stimuli. However, only the action-relevant components of stimuli need to reach the motor control and decision making structures in the brain. To learn these action-relevant stimuli, the part of the sensory system that feeds into a motor control circuit needs some kind of relevance feedback. We propose a simple network model consisting of a feature learning (sensory) layer that feeds into a reinforcement learning (action) layer. Feedback is established by the reinforcement learner's temporal difference (delta) term modulating an otherwise Hebbian-like learning rule of the feature learner. Under this influence, the feature learning network only learns the relevant features of the stimuli, i.e. those features on which goal-directed actions are to be based. With the input preprocessed in this manner, the reinforcement learner performs well in delayed reward tasks. The learning rule approximates an energy function's gradient descent. The model presents a link between reinforcement learning and unsupervised learning and may help to explain how the basal ganglia receive selective cortical input.

[1]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[2]  Ulrich Steinmetz,et al.  A Computational Model of Cortico-Striato-Thalamic Circuits in Goal-Directed Behaviour , 2008, ICANN.

[3]  Stefan Wermter,et al.  Robot docking with neural vision and reinforcement , 2004, Knowl. Based Syst..

[4]  Pieter R. Roelfsema,et al.  Attention-Gated Reinforcement Learning of Internal Representations for Classification , 2005, Neural Computation.

[5]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .

[6]  András Lörincz,et al.  Learning Tetris Using the Noisy Cross-Entropy Method , 2006, Neural Computation.

[7]  Andrew G. Barto,et al.  Reinforcement learning , 1998 .

[8]  Thomas E. Hazy,et al.  Towards an executive without a homunculus: computational models of the prefrontal cortex/basal ganglia system , 2007, Philosophical Transactions of the Royal Society B: Biological Sciences.

[9]  Stefan Wermter,et al.  A hybrid generative and predictive model of the motor cortex , 2006, Neural Networks.

[10]  Robert A. Legenstein,et al.  A Learning Theory for Reward-Modulated Spike-Timing-Dependent Plasticity with Application to Biofeedback , 2008, PLoS Comput. Biol..

[11]  J. Triesch,et al.  Emergence of Disparity Tuning during the Development of Vergence Eye Movements , 2007, 2007 IEEE 6th International Conference on Development and Learning.

[12]  Gerald Tesauro,et al.  Temporal difference learning and TD-Gammon , 1995, CACM.

[13]  Jörg Lücke,et al.  Dynamics of Cortical Columns - Self-organization of Receptive Fields , 2005, ICANN.

[14]  P. Földiák,et al.  Forming sparse representations by local anti-Hebbian learning , 1990, Biological Cybernetics.

[15]  Mark F Bear,et al.  Reward timing in the primary visual cortex. , 2006, Science.

[16]  Claude F. Touzet,et al.  Modeling and Simulation of Elementary Robot Behaviors using Associative Memories , 2006 .

[17]  Razvan V. Florian,et al.  Reinforcement Learning Through Modulation of Spike-Timing-Dependent Synaptic Plasticity , 2007, Neural Computation.

[18]  Andrew W. Moore,et al.  The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces , 2004, Machine Learning.

[19]  M. Farries,et al.  Reinforcement learning with modulated spike timing dependent synaptic plasticity. , 2007, Journal of neurophysiology.

[20]  Mark Witkowski,et al.  An Action-Selection Calculus , 2007, Adapt. Behav..

[21]  Cornelius Weber,et al.  A Sparse Generative Model of V1 Simple Cells with Intrinsic Plasticity , 2008, Neural Computation.

[22]  Cornelius Weber,et al.  From Exploration to Planning , 2008, ICANN.

[23]  Stephen Grossberg,et al.  How laminar frontal cortex and basal ganglia circuits interact to control planned and reactive saccades , 2004, Neural Networks.

[24]  Stefan Wermter,et al.  Reinforcement Learning Embedded in Brains and Robots , 2008 .

[25]  Patrik O. Hoyer,et al.  Non-negative Matrix Factorization with Sparseness Constraints , 2004, J. Mach. Learn. Res..

[26]  Takeshi Takahashi,et al.  Application of the self organizing maps for visual reinforcement learning of mobile robot , 2008 .

[27]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[28]  Duane DeSieno,et al.  Adding a conscience to competitive learning , 1988, IEEE 1988 International Conference on Neural Networks.

[29]  M. Botvinick,et al.  Hierarchically organized behavior and its neural foundations: A reinforcement learning perspective , 2009, Cognition.

[30]  Christopher A. Buneo,et al.  Direct visuomotor transformations for reaching , 2002, Nature.

[31]  Justus H. Piater,et al.  Closed-Loop Learning of Visual Control Policies , 2011, J. Artif. Intell. Res..

[32]  Gerald Tesauro,et al.  Temporal Difference Learning and TD-Gammon , 1995, J. Int. Comput. Games Assoc..

[33]  Minh Ha Nguyen,et al.  Cooperative coevolutionary mixture of experts : a neuro ensemble approach for automatic decomposition of classification problems , 2006 .

[34]  Shun-ichi Amari,et al.  Self-Organization in the Basal Ganglia with Modulation of Reinforcement Signals , 2002, Neural Computation.

[35]  Kenji Doya,et al.  What are the computations of the cerebellum, the basal ganglia and the cerebral cortex? , 1999, Neural Networks.

[36]  Jochen Triesch,et al.  Synergies Between Intrinsic and Synaptic Plasticity Mechanisms , 2007, Neural Computation.

[37]  J C Houk,et al.  Action selection and refinement in subcortical loops through basal ganglia and cerebellum , 2007, Philosophical Transactions of the Royal Society B: Biological Sciences.

[38]  G. Orban,et al.  Practising orientation identification improves orientation coding in V1 neurons , 2001, Nature.

[39]  Andrew McCallum,et al.  Reinforcement learning with selective perception and hidden state , 1996 .

[40]  S. Hochstein,et al.  The reverse hierarchy theory of visual perceptual learning , 2004, Trends in Cognitive Sciences.

[41]  J. Wickens,et al.  A cellular mechanism of reward-related learning , 2001, Nature.

[42]  C. I. Connolly,et al.  Building neural representations of habits. , 1999, Science.

[43]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[44]  Risto Miikkulainen,et al.  Self-Organizing Perceptual and Temporal Abstraction for Robot Reinforcement Learning , 2004, AAAI 2004.

[45]  David J. Foster,et al.  A model of hippocampally dependent navigation, using the temporal difference learning rule , 2000, Hippocampus.

[46]  E. Izhikevich Solving the distal reward problem through linkage of STDP and dopamine signaling , 2007, BMC Neuroscience.

[47]  Andrew James Smith,et al.  Applications of the self-organising map to reinforcement learning , 2002, Neural Networks.

[48]  Gerald Tesauro,et al.  Practical issues in temporal difference learning , 1992, Machine Learning.

[49]  Marc Toussaint,et al.  Learning a World Model and Planning with a Self-Organizing, Dynamic Neural System , 2003, NIPS.