Computational mechanisms of curiosity and goal-directed exploration

Successful behaviour depends on the right balance between maximising reward and soliciting information about the world. Here, we show how different types of information-gain emerge when casting behaviour as surprise minimisation. We present two distinct mechanisms for goal-directed exploration that express separable profiles of active sampling to reduce uncertainty. ‘Hidden state’ exploration motivates agents to sample unambiguous observations to accurately infer the (hidden) state of the world. Conversely, ‘model parameter’ exploration, compels agents to sample outcomes associated with high uncertainty, if they are informative for their representation of the task structure. We illustrate the emergence of these types of information-gain, termed active inference and active learning, and show how these forms of exploration induce distinct patterns of ‘Bayes-optimal’ behaviour. Our findings provide a computational framework to understand how distinct levels of uncertainty induce different modes of information-gain in decision-making.

[1]  Filip De Turck,et al.  VIME: Variational Information Maximizing Exploration , 2016, NIPS.

[2]  Tommy C. Blanchard,et al.  Orbitofrontal Cortex Uses Distinct Codes for Different Choice Attributes in Decisions Motivated by Curiosity , 2015, Neuron.

[3]  Jürgen Schmidhuber,et al.  Curious model-building control systems , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.

[4]  W. R. Thompson ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .

[5]  Samuel J. Gershman,et al.  Uncertainty and Exploration , 2018, bioRxiv.

[6]  Pierre-Yves Oudeyer,et al.  Curiosity Driven Exploration of Learned Disentangled Goal Spaces , 2018, CoRL.

[7]  Timothy H. Muller,et al.  Control of entropy in neural models of environmental state , 2019, eLife.

[8]  Marc G. Bellemare,et al.  Count-Based Exploration with Neural Density Models , 2017, ICML.

[9]  C. Mathys,et al.  Hierarchical Prediction Errors in Midbrain and Basal Forebrain during Sensory Learning , 2013, Neuron.

[10]  Robert C. Wilson,et al.  Orbitofrontal Cortex as a Cognitive Map of Task Space , 2014, Neuron.

[11]  Samuel J. Gershman,et al.  Pure Correlates of Exploration and Exploitation in the Human Brain , 2017 .

[12]  Peter Dayan,et al.  Dopamine: generalization and bonuses , 2002, Neural Networks.

[13]  Pierre-Yves Oudeyer,et al.  Intrinsic Motivation Systems for Autonomous Mental Development , 2007, IEEE Transactions on Evolutionary Computation.

[14]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[15]  T. Prescott,et al.  Active touch sensing in the rat: anticipatory and regulatory control of whisker movements during surface exploration. , 2009, Journal of neurophysiology.

[16]  Pierre Baldi,et al.  Bayesian surprise attracts human attention , 2005, Vision Research.

[17]  Karl J. Friston,et al.  The Computational Anatomy of Visual Neglect , 2017, Cerebral cortex.

[18]  Maarten Speekenbrink,et al.  Uncertainty and Exploration in a Restless Bandit Problem , 2015, Top. Cogn. Sci..

[19]  Michael E. Young,et al.  Gambling in rhesus macaques (Macaca mulatta): The effect of cues signaling risky choice outcomes , 2017, Learning & Behavior.

[20]  Mathew H. Evans,et al.  Prediction of primary somatosensory neuron activity during active tactile exploration , 2015, bioRxiv.

[21]  B. Hayden,et al.  The Psychology and Neuroscience of Curiosity , 2015, Neuron.

[22]  Raymond J. Dolan,et al.  The anatomy of choice: dopamine and decision-making , 2014, Philosophical Transactions of the Royal Society B: Biological Sciences.

[23]  Karl J. Friston,et al.  Active Inference: A Process Theory , 2017, Neural Computation.

[24]  P. Dayan,et al.  Cortical substrates for exploratory decisions in humans , 2006, Nature.

[25]  Peter Auer,et al.  Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[26]  Karl J. Friston,et al.  Free Energy, Precision and Learning: The Role of Cholinergic Neuromodulation , 2013, The Journal of Neuroscience.

[27]  Nicole M. Long,et al.  Rostrolateral Prefrontal Cortex and Individual Differences in Uncertainty-Driven Exploration , 2012, Neuron.

[28]  Karl J. Friston,et al.  Active inference and epistemic value , 2015, Cognitive neuroscience.

[29]  S. Gershman Deconstructing the human algorithms for exploration , 2018, Cognition.

[30]  Raymond J. Dolan,et al.  Dopamine, reward learning, and active inference , 2015, Front. Comput. Neurosci..

[31]  E. Düzel,et al.  The novelty exploration bonus and its attentional modulation , 2009, Neuropsychologia.

[32]  Martial Mermillod,et al.  From relief to surprise: Dual control of epistemic curiosity in the human brain , 2018, NeuroImage.

[33]  Karl J. Friston,et al.  Active inference and learning , 2016, Neuroscience & Biobehavioral Reviews.

[34]  Angela J. Yu,et al.  Should I stay or should I go? How the human brain manages the trade-off between exploitation and exploration , 2007, Philosophical Transactions of the Royal Society B: Biological Sciences.

[35]  Jonathan D. Cohen,et al.  Humans use directed and random exploration to solve the explore-exploit dilemma. , 2014, Journal of experimental psychology. General.

[36]  Angela J. Yu,et al.  Uncertainty, Neuromodulation, and Attention , 2005, Neuron.

[37]  Zeb Kurth-Nelson,et al.  The modulation of savouring by prediction error and its effects on choice , 2016, eLife.

[38]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[39]  Marco Vasconcelos,et al.  Irrational choice and the value of information , 2015, Scientific Reports.

[40]  Tom Schaul,et al.  Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.

[41]  Alexei A. Efros,et al.  Large-Scale Study of Curiosity-Driven Learning , 2018, ICLR.

[42]  Gerd Wagner,et al.  Altered activation in association with reward-related trial-and-error learning in patients with schizophrenia , 2010, NeuroImage.

[43]  Karl J. Friston,et al.  Active Inference, Curiosity and Insight , 2017, Neural Computation.

[44]  R. Dolan,et al.  Separate mesocortical and mesolimbic pathways encode effort and reward learning signals , 2017, Proceedings of the National Academy of Sciences.

[45]  Scott Cheng-Hsin Yang,et al.  Theoretical perspectives on active sensing , 2016, Current Opinion in Behavioral Sciences.

[46]  Joshua L. Jones,et al.  Orbitofrontal Cortex Supports Behavior and Learning Using Inferred But Not Cached Values , 2012, Science.

[47]  Yi Sun,et al.  Planning to Be Surprised: Optimal Bayesian Exploration in Dynamic Environments , 2011, AGI.

[48]  Clay B. Holroyd,et al.  Learning-related changes in brain activity following errors and performance feedback in schizophrenia , 2008, Schizophrenia Research.

[49]  Samuel J. Gershman,et al.  Representation learning with reward prediction errors , 2019, Neurons, Behavior, Data analysis, and Theory.

[50]  Leslie Pack Kaelbling,et al.  Associative Reinforcement Learning: A Generate and Test Algorithm , 1994, Machine Learning.

[51]  Karl J. Friston,et al.  Hierarchical Active Inference: A Theory of Motivated Control , 2018, Trends in Cognitive Sciences.

[52]  Richard N. Aslin,et al.  The Goldilocks Effect: Human Infants Allocate Attention to Visual Sequences That Are Neither Too Simple Nor Too Complex , 2012, PloS one.

[53]  V. Wyart,et al.  Computational noise in reward-guided learning drives behavioral variability in volatile environments , 2018, Nature Neuroscience.

[54]  Rafal Bogacz,et al.  A tutorial on the free-energy framework for modelling perception and learning , 2017, Journal of mathematical psychology.

[55]  Amos J. Storkey,et al.  Exploration by Random Network Distillation , 2018, ICLR.

[56]  Michael F Egan,et al.  Neural Correlates of Probabilistic Category Learning in Patients with Schizophrenia , 2009, The Journal of Neuroscience.

[57]  Raymond J. Dolan,et al.  The anatomy of choice: active inference and agency , 2013, Front. Hum. Neurosci..

[58]  Doina Precup,et al.  An information-theoretic approach to curiosity-driven reinforcement learning , 2012, Theory in Biosciences.

[59]  J. Hohwy The self-evidencing brain , 2016 .

[60]  Samuel J Gershman,et al.  Uncertainty and Exploration , 2018, bioRxiv.

[61]  L. Kaelbling Associative reinforcement learning: A generate and test algorithm , 2004, Machine Learning.

[62]  Pierre-Yves Oudeyer,et al.  Information-seeking, curiosity, and attention: computational and neural mechanisms , 2013, Trends in Cognitive Sciences.

[63]  Andrew M. Wikenheiser,et al.  Suppression of Ventral Hippocampal Output Impairs Integrated Orbitofrontal Encoding of Task Structure , 2017, Neuron.

[64]  Matthew J. Beal Variational algorithms for approximate Bayesian inference , 2003 .

[65]  Raymond J. Dolan,et al.  Neural signals encoding shifts in beliefs , 2016, NeuroImage.

[66]  Benjamin M. Robinson,et al.  Selective Reinforcement Learning Deficits in Schizophrenia Support Predictions from Computational Models of Striatal-Cortical Dysfunction , 2007, Biological Psychiatry.

[67]  R. Agrawal Sample mean based index policies by O(log n) regret for the multi-armed bandit problem , 1995, Advances in Applied Probability.

[68]  P. Dayan,et al.  A framework for mesencephalic dopamine systems based on predictive Hebbian learning , 1996, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[69]  Karl J. Friston,et al.  Attention, Uncertainty, and Free-Energy , 2010, Front. Hum. Neurosci..

[70]  Peter Dayan,et al.  A Neural Substrate of Prediction and Reward , 1997, Science.

[71]  Richard N Aslin,et al.  The Goldilocks effect in infant auditory attention. , 2014, Child development.

[72]  Timothy Edward John Behrens,et al.  Two Anatomically and Computationally Distinct Learning Signals Predict Changes to Stimulus-Outcome Associations in Hippocampus , 2016, Neuron.

[73]  Sara A Solla,et al.  Whisking mechanics and active sensing , 2016, Current Opinion in Neurobiology.

[74]  Timothy E. J. Behrens,et al.  Neural Mechanisms of Foraging , 2012, Science.

[75]  Samuel J. Gershman,et al.  The algorithmic architecture of exploration in the human brain , 2019, Current Opinion in Neurobiology.

[76]  Ronald A. Howard,et al.  Information Value Theory , 1966, IEEE Trans. Syst. Sci. Cybern..

[77]  Joshua S. Beckmann,et al.  Suboptimal choice in rats: Incentive salience attribution promotes maladaptive decision-making , 2017, Behavioural Brain Research.

[78]  Ethan S. Bromberg-Martin,et al.  Midbrain Dopamine Neurons Signal Preference for Advance Information about Upcoming Rewards , 2009, Neuron.

[79]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[80]  Shipra Agrawal,et al.  Analysis of Thompson Sampling for the Multi-armed Bandit Problem , 2011, COLT.

[81]  C. Padoa-Schioppa,et al.  Neurons in the orbitofrontal cortex encode economic value , 2006, Nature.

[82]  Jürgen Schmidhuber,et al.  An intrinsic value system for developing multiple invariant representations with incremental slowness learning , 2013, Front. Neurorobot..

[83]  Andrew G. Barto,et al.  Intrinsic Motivation and Reinforcement Learning , 2013, Intrinsically Motivated Learning in Natural and Artificial Systems.

[84]  Raymond J. Dolan,et al.  Dopamine, Affordance and Active Inference , 2012, PLoS Comput. Biol..