Approaches to Learning to Control Dynamic Uncertainty

In dynamic environments, when faced with a choice of which learning strategy to adopt, do people choose to mostly explore (maximizing their long term gains) or exploit (maximizing their short term gains)? More to the point, how does this choice of learning strategy influence one’s later ability to control the environment? In the present study, we explore whether people’s self-reported learning strategies and levels of arousal (i.e., surprise, stress) correspond to performance measures of controlling a Highly Uncertain or Moderately Uncertain dynamic environment. Generally, self-reports suggest a preference for exploring the environment to begin with. After which, those in the Highly Uncertain environment generally indicated they exploited more than those in the Moderately Uncertain environment; this difference did not impact on performance on later tests of people’s ability to control the dynamic environment. Levels of arousal were also differentially associated with the uncertainty of the environment. Going beyond behavioral data, our model of dynamic decision-making revealed that, in actual fact, there was no difference in exploitation levels between those in the highly uncertain or moderately uncertain environments, but there were differences based on sensitivity to negative reinforcement. We consider the implications of our findings with respect to learning and strategic approaches to controlling dynamic uncertainty.

[1]  Richard J. Tunney,et al.  Some decks are better than others: The effect of reinforcer type and task instructions on learning in the Iowa Gambling Task , 2006, Brain and Cognition.

[2]  Magda Osman Future-Minded: The Psychology of Agency and Control , 2014 .

[3]  N. Srinivasan,et al.  Role of affect in decision making. , 2013, Progress in brain research.

[4]  A. Lawrence,et al.  The somatic marker hypothesis: A critical evaluation , 2006, Neuroscience & Biobehavioral Reviews.

[5]  Andrew G. Barto,et al.  Reinforcement learning , 1998 .

[6]  Massimo Silvetti,et al.  Reinforcement Learning, High-Level Cognition, and the Human Brain , 2012 .

[7]  Magda Osman,et al.  Observation Can Be as Effective as Action in Problem Solving , 2008, Cogn. Sci..

[8]  Jürgen Schmidhuber,et al.  Formal Theory of Creativity, Fun, and Intrinsic Motivation (1990–2010) , 2010, IEEE Transactions on Autonomous Mental Development.

[9]  Peter Auer,et al.  Improved Rates for the Stochastic Continuum-Armed Bandit Problem , 2007, COLT.

[10]  Stuart A. Kauffman,et al.  Optimal search on a technology landscape , 2000 .

[11]  J. Tenenbaum,et al.  Probabilistic models of cognition: exploring representations and inductive biases , 2010, Trends in Cognitive Sciences.

[12]  Sara J. Shettleworth,et al.  Time horizon and choice by pigeons in a prey-selection task , 1991 .

[13]  Pierre-Yves Oudeyer,et al.  Information-seeking, curiosity, and attention: computational and neural mechanisms , 2013, Trends in Cognitive Sciences.

[14]  A. Damasio,et al.  Failure to respond autonomically to anticipated future outcomes following damage to prefrontal cortex. , 1996, Cerebral cortex.

[15]  Raymond J. Dolan,et al.  Feeling the neurobiological self , 1999, Nature.

[16]  T. Rakow,et al.  Doomed to repeat the successes of the past: History is best forgotten for repeated choices with nonstationary payoffs , 2009, Memory & cognition.

[17]  M. Lee,et al.  A Bayesian analysis of human decision-making on bandit problems , 2009 .

[18]  Emmanouil Konstantinidis,et al.  Don't bet on it! Wagering as a measure of awareness in decision making under uncertainty. , 2014, Journal of experimental psychology. General.

[19]  P. Dayan,et al.  Reinforcement learning: The Good, The Bad and The Ugly , 2008, Current Opinion in Neurobiology.

[20]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[21]  M. Khamassi,et al.  Dopaminergic Control of the Exploration-Exploitation Trade-Off via the Basal Ganglia , 2012, Front. Neurosci..

[22]  Michael E Young,et al.  Response variability in pigeons in a pavlovian task , 2010, Learning & behavior.

[23]  Matthew M. Botvinick,et al.  Anticipation of cognitive demand during decision-making , 2009, Psychological research.

[24]  A. Damasio The feeling of what happens , 2001 .

[25]  Eric Shea-Brown,et al.  Computational models of decision making: integration, stability, and noise , 2012, Current Opinion in Neurobiology.

[26]  Peter Bossaerts,et al.  Do not Bet on the Unknown Versus Try to Find Out More: Estimation Uncertainty and “Unexpected Uncertainty” Both Modulate Exploration , 2012, Front. Neurosci..

[27]  Sven Rady,et al.  Optimal Experimentation in a Changing Environment , 1997 .

[28]  Toby E. Stuart,et al.  Local search and the evolution of technological capabilities , 2007 .

[29]  Bradley C. Love,et al.  Learning in Noise: Dynamic Decision-Making in a Variable Environment. , 2009, Journal of mathematical psychology.

[30]  B. Burns,et al.  The Quarterly Journal of Experimental Psychology Section A: Human Experimental Psychology Goal Specificity Effects on Hypothesis Testing in Problem Solving , 2022 .

[31]  Magda Osman,et al.  Seeing is as Good as Doing , 2008, J. Probl. Solving.

[32]  Csaba Szepesvári,et al.  Exploration-exploitation tradeoff using variance estimates in multi-armed bandits , 2009, Theor. Comput. Sci..

[33]  Marcia C Smith Pasqualini,et al.  Stronger autonomic response accompanies better learning: A test of Damasio's somatic marker hypothesis , 2004 .

[34]  Dianne C. Berry,et al.  The Role of Action in Implicit Learning , 1991 .

[35]  John Sweller,et al.  Cognitive Load During Problem Solving: Effects on Learning , 1988, Cogn. Sci..

[36]  Andrew G. Barto,et al.  An intrinsic reward mechanism for efficient exploration , 2006, ICML.

[37]  Physiological and behavioral signatures of reflective exploratory choice , 2014, Cognitive, affective & behavioral neuroscience.

[38]  Kai Kuikkaniemi,et al.  Anticipatory electrodermal activity and decision making in a computer poker-game , 2013 .

[39]  Kazuo Shigemasu,et al.  Application of the somatic marker hypothesis to individual differences in decision making , 2003, Biological Psychology.

[40]  P. Dayan,et al.  Cortical substrates for exploratory decisions in humans , 2006, Nature.

[41]  A. Damasio,et al.  Deciding Advantageously Before Knowing the Advantageous Strategy , 1997, Science.

[42]  P. Dayan,et al.  States versus Rewards: Dissociable Neural Prediction Error Signals Underlying Model-Based and Model-Free Reinforcement Learning , 2010, Neuron.

[43]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[44]  Bradley C. Love,et al.  Short-term gains, long-term pains: How cues about state aid learning in dynamic environments , 2009, Cognition.

[45]  Arno Villringer,et al.  Iowa Gambling Task: There is More to Consider than Long-Term Outcome. Using a Linear Equation Model to Disentangle the Impact of Outcome and Frequency of Gains and Losses , 2012, Front. Neurosci..

[46]  Eric-Jan Wagenmakers,et al.  The Role of the Noradrenergic System in the Exploration–Exploitation Trade-Off: A Psychopharmacological Study , 2010, Front. Hum. Neurosci..

[47]  Aaron P Blaisdell,et al.  Effect of reward probability on spatial and temporal variation. , 2010, Journal of experimental psychology. Animal behavior processes.

[48]  R. Luce,et al.  On the possible psychophysical laws. , 1959, Psychological review.

[49]  M. Osman Controlling uncertainty: a review of human behavior in complex dynamic environments. , 2010, Psychological bulletin.

[50]  Scott D. Lane,et al.  Relationship between impulsivity and decision making in cocaine dependence , 2010, Psychiatry Research.

[51]  Alfonso Caramazza,et al.  Do somatic markers mediate decisions on the gambling task? , 2002, Nature Neuroscience.

[52]  Maarten Speekenbrink,et al.  Cue utilization and strategy application in stable and unstable dynamic environments , 2011, Cognitive Systems Research.

[53]  M. Osman Positive transfer and negative transfer/antilearning of problem-solving skills. , 2008, Journal of experimental psychology. General.

[54]  Andrea Stocco,et al.  The cognitive modeling of human behavior: Why a model is (sometimes) better than 10,000 words , 2007, Cognitive Systems Research.

[55]  Elizabeth A. Kensinger,et al.  Negative Emotion Enhances Memory Accuracy , 2007 .

[56]  K. Doya,et al.  The computational neurobiology of learning and reward , 2006, Current Opinion in Neurobiology.

[57]  S. Lea,et al.  The cognitive mechanisms of optimal sampling , 2012, Behavioural Processes.

[58]  A. Damasio,et al.  Insensitivity to future consequences following damage to human prefrontal cortex , 1994, Cognition.

[59]  J. Busemeyer,et al.  Older adults as adaptive decision makers: evidence from the Iowa Gambling Task. , 2005, Psychology and aging.

[60]  R. Nosofsky,et al.  An exemplar-based random walk model of speeded classification. , 1997, Psychological review.

[61]  W. T. Maddox,et al.  Annals of the New York Academy of Sciences Human Category Learning 2.0 Brief Review of First-generation Research , 2022 .

[62]  S. Lewandowsky The Rewards and Hazards of Computer Simulations , 1993 .

[63]  Konrad P. Körding,et al.  Exploration and Exploitation During Sequential Search , 2009, Cogn. Sci..

[64]  Andrew G. Barto,et al.  Intrinsic Motivation and Reinforcement Learning , 2013, Intrinsically Motivated Learning in Natural and Artificial Systems.

[65]  A. Simmons,et al.  Emotional decision-making and its dissociable components in schizophrenia and schizoaffective disorder: A behavioural and MRI investigation , 2008, Neuropsychologia.

[66]  R. Katila,et al.  Something Old, Something New: A Longitudinal Study of Search Behavior and New Product Introduction , 2002 .

[67]  J M SAKODA,et al.  Anticipation of reward as a function of partial reinforcement. , 1956, Journal of experimental psychology.

[68]  James M. Sakoda,et al.  Effects of a Random versus Pattern Instructional Set in a Contingent Partial Reinforcement Situation , 1957 .

[69]  Woojae Kim,et al.  Cognitive Mechanisms Underlying Risky Decision-Making in Chronic Cannabis Users. , 2010, Journal of mathematical psychology.

[70]  Dennis Garlick,et al.  Pigeon and human performance in a multi-armed bandit task in response to changes in variable interval schedules , 2011, Learning & behavior.

[71]  Gregory P. Lee,et al.  Different Contributions of the Human Amygdala and Ventromedial Prefrontal Cortex to Decision-Making , 1999, The Journal of Neuroscience.

[72]  Stephan Billinger,et al.  Search on Rugged Landscapes: An Experimental Study , 2013, Organ. Sci..

[73]  Stephan Billinger,et al.  Search on Rugged Landscapes: An Experimental Study , 2013 .

[74]  Daniel A. Levinthal,et al.  Chasing a Moving Target: Exploitation and Exploration in Dynamic Environments , 2011, Manag. Sci..

[75]  A. Bechara,et al.  Neurobiology of decision-making: risk and reward. , 2001, Seminars in clinical neuropsychiatry.

[76]  D L Medin,et al.  Evaluation of exemplar-based generalization and the abstraction of categorical information. , 1984, Journal of experimental psychology. Learning, memory, and cognition.