Generalized attention-weighted reinforcement learning

In neuroscience, attention has been shown to bidirectionally interact with reinforcement learning (RL) to reduce the dimensionality of task representations, restricting computations to relevant features. In machine learning, despite their popularity, attention mechanisms have seldom been administered to decision-making problems. Here, we leverage a theoretical model from computational neuroscience - the attention-weighted RL (AWRL), defining how humans identify task-relevant features (i.e., that allow value predictions) - to design an applied deep RL paradigm. We formally demonstrate that the conjunction of the self-attention mechanism, widely employed in machine learning, with value function approximation is a general formulation of the AWRL model. To evaluate our agent, we train it on three Atari tasks at different complexity levels, incorporating both task-relevant and irrelevant features. Because the model uses semantic observations, we can uncover not only which features the agent elects to base decisions on, but also how it chooses to compile more complex, relational features from simpler ones. We first show that performance depends in large part on the ability to compile new compound features, rather than mere focus on individual features. In line with neuroscience predictions, self-attention leads to high resiliency to noise (irrelevant features) compared to other benchmark models. Finally, we highlight the importance and separate contributions of both bottom-up and top-down attention in the learning process. Together, these results demonstrate the broader validity of the AWRL framework in complex task scenarios, and illustrate the benefits of a deeper integration between neuroscience-derived models and RL for decision making in machine learning.

[1]  Xiao-Jing Wang Macroscopic gradients of synaptic excitation and inhibition in the neocortex , 2020, Nature Reviews Neuroscience.

[2]  Michael L. Mack,et al.  Dynamic updating of hippocampal object representations reflects new conceptual knowledge , 2016, Proceedings of the National Academy of Sciences.

[3]  Pieter R. Roelfsema,et al.  Object Selection by Automatic Spreading of Top-Down Attentional Signals in V1 , 2020, The Journal of Neuroscience.

[4]  Y. Niv Learning task-state representations , 2019, Nature Neuroscience.

[5]  K. Doya Reinforcement learning: Computational theory and biological mechanisms , 2007 .

[6]  N. Mackintosh A Theory of Attention: Variations in the Associability of Stimuli with Reinforcement , 1975 .

[7]  Etienne Koechlin,et al.  Foundations of human reasoning in the prefrontal cortex , 2014, Science.

[8]  V. Hayward,et al.  Segregation of Tactile Input Features in Neurons of the Cuneate Nucleus , 2014, Neuron.

[9]  Atsushi Kikumoto,et al.  Conjunctive representations that integrate stimuli, responses, and rules are critical for action selection , 2020, Proceedings of the National Academy of Sciences.

[10]  C. Gilbert,et al.  Attention and primary visual cortex. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Mitsuo Kawato,et al.  The neural and cognitive architecture for learning from a small sample , 2018, Current Opinion in Neurobiology.

[12]  H. Bergman,et al.  Information processing, dimensionality reduction and reinforcement learning in the basal ganglia , 2003, Progress in Neurobiology.

[13]  Yuan Chang Leong,et al.  Dynamic Interaction between Reinforcement Learning and Attention in Multidimensional Environments , 2017, Neuron.

[14]  Kenneth D Miller,et al.  How biological attention mechanisms improve task performance in a large-scale visual system model , 2017, bioRxiv.

[15]  Chaz Firestone,et al.  Performance vs. competence in human–machine comparisons , 2020, Proceedings of the National Academy of Sciences.

[16]  Yujia Hu,et al.  A Neural Basis for Categorizing Sensory Stimuli to Enhance Decision Accuracy , 2020, Current Biology.

[17]  Tom Schaul,et al.  Prioritized Experience Replay , 2015, ICLR.

[18]  A. Nobre,et al.  Top-down modulation: bridging selective attention and working memory , 2012, Trends in Cognitive Sciences.

[19]  Andrew W. Moore,et al.  Prioritized sweeping: Reinforcement learning with less data and less time , 2004, Machine Learning.

[20]  Peter Dayan,et al.  Hippocampal Contributions to Control: The Third Way , 2007, NIPS.

[21]  E. Rolls,et al.  Abstract reward and punishment representations in the human orbitofrontal cortex , 2001, Nature Neuroscience.

[22]  Ida Momennejad,et al.  Offline replay supports planning in human reinforcement learning , 2018, eLife.

[23]  Ian C. Ballard,et al.  Holistic Reinforcement Learning: The Role of Structure and Attention , 2019, Trends in Cognitive Sciences.

[24]  M. Corbetta,et al.  Control of goal-directed and stimulus-driven attention in the brain , 2002, Nature Reviews Neuroscience.

[25]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[26]  M. Kawato,et al.  Unconscious reinforcement learning of hidden brain states supported by confidence , 2020, Nature Communications.

[27]  Mitsuo Kawato,et al.  Value Shapes Abstraction During Learning , 2020, bioRxiv.

[28]  M. Frank,et al.  Mechanisms of hierarchical reinforcement learning in corticostriatal circuits 1: computational analysis. , 2012, Cerebral cortex.

[29]  Marc G. Bellemare,et al.  The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..

[30]  Stefan Treue,et al.  Feature-based attention influences motion processing gain in macaque visual cortex , 1999, Nature.

[31]  Silvia Bernardi,et al.  The Geometry of Abstraction in the Hippocampus and Prefrontal Cortex , 2020, Cell.

[32]  Robert C. Wilson,et al.  Reinforcement Learning in Multidimensional Environments Relies on Attention Mechanisms , 2015, The Journal of Neuroscience.

[33]  Yael Niv,et al.  A particle filtering account of selective attention during learning , 2019, 2019 Conference on Cognitive Computational Neuroscience.

[34]  Nathalie L Rochefort,et al.  Reward Association Enhances Stimulus-Specific Representations in Primary Visual Cortex , 2020, Current Biology.

[35]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[36]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[37]  Mark K. Ho,et al.  The value of abstraction , 2019, Current Opinion in Behavioral Sciences.

[38]  S. Dehaene,et al.  What is consciousness, and could machines have it? , 2017, Science.

[39]  Taosheng Liu,et al.  Biased Neural Representation of Feature-Based Attention in the Human Frontoparietal Network , 2020, The Journal of Neuroscience.

[40]  Ida Momennejad,et al.  Predictive Representations in Hippocampal and Prefrontal Hierarchies , 2019, The Journal of Neuroscience.

[41]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[42]  D. Somers,et al.  Functional MRI reveals spatially specific attentional modulation in human primary visual cortex. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[43]  Alexander Grunewald,et al.  The Integration of Multiple Stimulus Features by V1 Neurons , 2004, The Journal of Neuroscience.

[44]  E. Knudsen Fundamental components of attention. , 2007, Annual review of neuroscience.

[45]  Julie D. Golomb,et al.  A taxonomy of external and internal attention. , 2011, Annual review of psychology.

[46]  J. Pearce,et al.  A model for Pavlovian learning: variations in the effectiveness of conditioned but not of unconditioned stimuli. , 1980, Psychological review.

[47]  S. Treue,et al.  Feature-Based Attention Increases the Selectivity of Population Responses in Primate Visual Cortex , 2004, Current Biology.

[48]  D. Hubel,et al.  Receptive fields, binocular interaction and functional architecture in the cat's visual cortex , 1962, The Journal of physiology.

[49]  Marcelo G Mattar,et al.  Prioritized memory access explains planning and hippocampal replay , 2017, Nature Neuroscience.

[50]  J. Kruschke,et al.  ALCOVE: an exemplar-based connectionist model of category learning. , 1992, Psychological review.

[51]  Daeyeol Lee,et al.  Feature-based learning improves adaptability without compromising precision , 2017, Nature Communications.