Balancing exploration and exploitation with information and randomization

Explore-exploit decisions require us to trade off the benefits of exploring unknown options to learn more about them, with exploiting known options, for immediate reward. Such decisions are ubiquitous in nature, but from a computational perspective, they are notoriously hard. There is therefore much interest in how humans and animals make these decisions and recently there has been an explosion of research in this area. Here we provide a biased and incomplete snapshot of this field focusing on the major finding that many organisms use two distinct strategies to solve the explore-exploit dilemma: a bias for information ('directed exploration') and the randomization of choice ('random exploration'). We review evidence for the existence of these strategies, their computational properties, their neural implementations, as well as how directed and random exploration vary over the lifespan. We conclude by highlighting open questions in this field that are ripe to both explore and exploit.

[1]  Jonathan D. Cohen,et al.  Boredom, Information-Seeking and Exploration , 2016, CogSci.

[2]  Andrew R. Mitz,et al.  Subcortical Substrates of Explore-Exploit Decisions in Primates , 2019, Neuron.

[3]  A. Dussutour,et al.  Slime mold uses an externalized spatial “memory” to navigate in complex environments , 2012, Proceedings of the National Academy of Sciences.

[4]  S. Larcom,et al.  The Benefits of Forced Experimentation: Striking Evidence from the London Underground Network , 2015 .

[5]  T. Hare,et al.  Transcranial Stimulation over Frontopolar Cortex Elucidates the Choice Attributes and Neural Mechanisms Used to Resolve Exploration–Exploitation Trade-Offs , 2015, The Journal of Neuroscience.

[6]  Angela J. Yu,et al.  Should I stay or should I go? How the human brain manages the trade-off between exploitation and exploration , 2007, Philosophical Transactions of the Royal Society B: Biological Sciences.

[7]  Rahul Bhui,et al.  Structured, uncertainty-driven exploration in real-world consumer choice , 2019, Proceedings of the National Academy of Sciences.

[8]  Samuel J. Gershman,et al.  The algorithmic architecture of exploration in the human brain , 2019, Current Opinion in Neurobiology.

[9]  Jonathan D. Cohen,et al.  The effect of atomoxetine on random and directed exploration in humans , 2017, PloS one.

[10]  Timothy H. Muller,et al.  Control of entropy in neural models of environmental state , 2019, eLife.

[11]  E. Bonawitz,et al.  Choosing to Learn: Evidence Evaluation for Active Learning and Teaching in Early Childhood , 2018 .

[12]  Robert C. Wilson,et al.  A causal role for right frontopolar cortex in directed, but not random, exploration , 2016, bioRxiv.

[13]  Andreas Wilke,et al.  Foraging, Exploration, or Search? On the (Lack of) Convergent Validity Between Three Behavioral Paradigms , 2018, Evolutionary Behavioral Sciences.

[14]  Joshua I. Gold,et al.  Pupil Size as a Window on Neural Substrates of Cognition , 2020, Trends in Cognitive Sciences.

[15]  W. R. Thompson ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .

[16]  Nicolas E. Humphries,et al.  Scaling laws of marine predator search behaviour , 2008, Nature.

[17]  B. Hayden,et al.  The Psychology and Neuroscience of Curiosity , 2015, Neuron.

[18]  Vincent D Costa,et al.  Dopamine modulates novelty seeking behavior during decision making. , 2014, Behavioral neuroscience.

[19]  L. Schulz,et al.  Serious fun: preschoolers engage in more exploratory play when evidence is confounded. , 2007, Developmental psychology.

[20]  K. Branson,et al.  Behavioral Variability through Stochastic Choice and Its Gating by Anterior Cingulate Cortex , 2014, Cell.

[21]  Mehdi Khamassi,et al.  Dopamine blockade impairs the exploration-exploitation trade-off in rats , 2019, Scientific Reports.

[22]  Pierre-Yves Oudeyer,et al.  Towards a neuroscience of active sampling and curiosity , 2018, Nature Reviews Neuroscience.

[23]  Jonathan D. Cohen,et al.  Humans use directed and random exploration to solve the explore-exploit dilemma. , 2014, Journal of experimental psychology. General.

[24]  P. Dayan,et al.  Cortical substrates for exploratory decisions in humans , 2006, Nature.

[25]  J. Peters,et al.  Dopaminergic modulation of the exploration/exploitation trade-off in human decision-making , 2020, eLife.

[26]  Robert C. Wilson,et al.  Charting the Expansion of Strategic Exploratory Behavior During Adolescence , 2017, Journal of experimental psychology. General.

[27]  Aimee E. Stahl,et al.  Observing the unexpected enhances infants’ learning and exploration , 2015, Science.

[28]  N. Daw,et al.  Striatal Activity Underlies Novelty-Based Choice in Humans , 2008, Neuron.

[29]  B. Averbeck,et al.  Uncertainty about mapping future actions into rewards may underlie performance on multiple measures of impulsivity in behavioral addiction: evidence from Parkinson's disease. , 2013, Behavioral neuroscience.

[30]  L. Nadel,et al.  The Hippocampus as a Cognitive Map , 1978 .

[31]  Joshua B Tenenbaum,et al.  Sticking to the Evidence? A Behavioral and Computational Case Study of Micro-Theory Change in the Domain of Magnetism , 2019, Cogn. Sci..

[32]  Angela J. Yu,et al.  What drive information-seeking in healthy and addicted behaviors , 2020, bioRxiv.

[33]  Joseph H. Solomon,et al.  Variability in velocity profiles during free-air whisking behavior of unrestrained rats. , 2008, Journal of neurophysiology.

[34]  T. Moore,et al.  Exploration Disrupts Choice-Predictive Signals and Alters Dynamics in Prefrontal Cortex , 2017, Neuron.

[35]  Samuel J. Gershman,et al.  Dopaminergic genes are associated with both directed and random exploration , 2018, Neuropsychologia.

[37]  Charley M. Wu,et al.  Running head: DIRECTED AND RANDOM EXPLORATION IN CHILDREN 1 Development of directed and random exploration in children , 2020 .

[38]  Ashesh K Dhawale,et al.  The Role of Variability in Motor Learning. , 2017, Annual review of neuroscience.

[39]  John M. Pearson,et al.  Pupil size and social vigilance in rhesus macaques , 2014, Front. Neurosci..

[40]  L. Schulz,et al.  Children balance theories and evidence in exploration, explanation, and learning , 2012, Cognitive Psychology.

[41]  Anjali Raja Beharelle,et al.  Increased random exploration in schizophrenia is associated with inflammation , 2020, bioRxiv.

[42]  Thomas T. Hills,et al.  Exploration versus exploitation in space, mind, and society , 2015, Trends in Cognitive Sciences.

[43]  Michael D. Lee,et al.  Psychological models of human and optimal performance in bandit problems , 2011, Cognitive Systems Research.

[44]  Sander Nieuwenhuis,et al.  Pupil Diameter Predicts Changes in the Exploration–Exploitation Trade-off: Evidence for the Adaptive Gain Theory , 2011, Journal of Cognitive Neuroscience.

[45]  Axel Cleeremans,et al.  Should we control? The interplay between cognitive control and information integration in the resolution of the exploration-exploitation dilemma. , 2019, Journal of experimental psychology. General.

[46]  Matthew R Nassar,et al.  Taming the beast: extracting generalizable knowledge from computational models of cognition , 2016, Current Opinion in Behavioral Sciences.

[47]  Bruno B. Averbeck,et al.  Theory of Choice in Bandit, Information Sampling and Foraging Tasks , 2015, PLoS Comput. Biol..

[48]  V. Wyart,et al.  Computational noise in reward-guided learning drives behavioral variability in volatile environments , 2018, Nature Neuroscience.

[49]  Ben R. Newell,et al.  Unpacking the Exploration–Exploitation Tradeoff: A Synthesis of Human and Animal Literatures , 2015 .

[50]  Tirin Moore,et al.  Both a Gauge and a Filter: Cognitive Modulations of Pupil Size , 2019, Front. Neurol..

[51]  Sébastien Bubeck,et al.  Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..

[52]  Thomas L. Griffiths,et al.  Win-Stay, Lose-Sample: A simple sequential algorithm for approximating Bayesian inference , 2014, Cognitive Psychology.

[53]  M. Lee,et al.  A Bayesian analysis of human decision-making on bandit problems , 2009 .

[54]  R. Bellman A PROBLEM IN THE SEQUENTIAL DESIGN OF EXPERIMENTS , 1954 .

[55]  S. Gershman Deconstructing the human algorithms for exploration , 2018, Cognition.

[56]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[57]  J. Gittins Bandit processes and dynamic allocation indices , 1979 .

[58]  Samuel J. Gershman,et al.  Dissociable neural correlates of uncertainty underlie different exploration strategies , 2020, Nature Communications.

[59]  M. Frank,et al.  Prefrontal and striatal dopaminergic genes predict individual differences in exploration and exploitation. , 2009, Nature neuroscience.

[60]  Robert C. Froemke,et al.  Coordinated forms of noradrenergic plasticity in the locus coeruleus and primary auditory cortex , 2015, Nature Neuroscience.

[61]  Noah D. Goodman,et al.  Theory learning as stochastic search in the language of thought , 2012 .

[62]  S. Hidi,et al.  The Four-Phase Model of Interest Development , 2006 .

[63]  J. Peters,et al.  Attenuated Directed Exploration during Reinforcement Learning in Gambling Disorder , 2019, The Journal of Neuroscience.

[64]  Paul Schrater,et al.  The hippocampus and exploration: dynamically evolving behavior and neural representations , 2012, Front. Hum. Neurosci..

[65]  S. Denison,et al.  Probabilistic models, learning algorithms, and response variability: sampling in cognitive development , 2014, Trends in Cognitive Sciences.

[66]  A. Fairhall,et al.  Dopaminergic modulation of basal ganglia output through coupled excitation–inhibition , 2017, Proceedings of the National Academy of Sciences.

[67]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[68]  M. Frank,et al.  Deficits in Positive Reinforcement Learning and Uncertainty-Driven Exploration Are Associated with Distinct Aspects of Negative Symptoms in Schizophrenia , 2011, Biological Psychiatry.

[69]  Jiaxin Cindy Tu,et al.  Rule adherence warps decision-making , 2019, bioRxiv.

[70]  A. Haynie,et al.  Disturbance modifies payoffs in the explore-exploit trade-off , 2019, Nature Communications.

[71]  D. Berlyne Curiosity and exploration. , 1966, Science.

[72]  R Becket Ebitz,et al.  Tonic exploration governs both flexibility and lapses , 2019, PLoS Comput. Biol..

[73]  D. Ellsberg Decision, probability, and utility: Risk, ambiguity, and the Savage axioms , 1961 .

[74]  Shipra Agrawal,et al.  Analysis of Thompson Sampling for the Multi-armed Bandit Problem , 2011, COLT.

[75]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[76]  Elizabeth Baraff Bonawitz,et al.  Awesome play: Awe increases preschooler's exploration and discovery , 2018, CogSci.

[77]  P. Taylor,et al.  Test of optimal sampling by foraging great tits , 1978 .

[78]  Michael X. Cohen,et al.  Frontal theta reflects uncertainty and unexpectedness during exploration and exploitation. , 2012, Cerebral cortex.

[79]  David H Gire,et al.  Many Paths to the Same Goal: Balancing Exploration and Exploitation during Probabilistic Route Planning , 2020, eNeuro.

[80]  Jonathan D. Nelson,et al.  Generalization guides human exploration in vast decision spaces , 2017, Nature Human Behaviour.

[81]  Colin Camerer,et al.  Recent developments in modeling preferences: Uncertainty and ambiguity , 1992 .

[82]  Angela J. Yu,et al.  Forgetful Bayes and myopic planning: Human learning and decision-making in a bandit setting , 2013, NIPS.

[83]  Benjamin Van Roy,et al.  Deep Exploration via Bootstrapped DQN , 2016, NIPS.

[84]  C. Downs,et al.  Anthropogenic influences on the time budgets of urban vervet monkeys , 2019, Landscape and Urban Planning.

[85]  Nicole M. Long,et al.  Supplemental Figure , 2013 .

[86]  Bruno B Averbeck,et al.  Primate Orbitofrontal Cortex Codes Information Relevant for Managing Explore–Exploit Tradeoffs , 2020, The Journal of Neuroscience.

[87]  Eric-Jan Wagenmakers,et al.  The Role of the Noradrenergic System in the Exploration–Exploitation Trade-Off: A Psychopharmacological Study , 2010, Front. Hum. Neurosci..

[88]  N. Ichikawa,et al.  Neural and sympathetic activity associated with exploration in decision-making: further evidence for involvement of insula , 2014, Front. Behav. Neurosci..

[89]  Jacqueline M. Fulvio,et al.  Probability Learning: Changes in Behavior Across Time and Development. , 2018, Child development.

[90]  Z. Mainen,et al.  Distinct Sources of Deterministic and Stochastic Components of Action Timing Decisions in Rodent Frontal Cortex , 2016, Neuron.

[91]  Raymond J. Dolan,et al.  Human complex exploration strategies are extended via noradrenaline-modulated heuristics , 2020, bioRxiv.

[92]  Robert C. Wilson,et al.  Differential Effects of Psychotic Illness on Directed and Random Exploration , 2020, Computational Psychiatry.

[93]  A. Doupe,et al.  The Avian Basal Ganglia Are a Source of Rapid Behavioral Variation That Enables Vocal Motor Exploration , 2018, The Journal of Neuroscience.