Learning in Noise: Dynamic Decision-Making in a Variable Environment.

In engineering systems, noise is a curse, obscuring important signals and increasing the uncertainty associated with measurement. However, the negative effects of noise and uncertainty are not universal. In this paper, we examine how people learn sequential control strategies given different sources and amounts of feedback variability. In particular, we consider people's behavior in a task where short- and long-term rewards are placed in conflict (i.e., the best option in the short-term is worst in the long-term). Consistent with a model based on reinforcement learning principles (Gureckis & Love, in press), we find that learners differentially weight information predictive of the current task state. In particular, when cues that signal state are noisy and uncertain, we find that participants' ability to identify an optimal strategy is strongly impaired relative to equivalent amounts of uncertainty that obscure the rewards/valuations of those states. In other situations, we find that noise and uncertainty in reward signals may paradoxically improve performance by encouraging exploration. Our results demonstrate how experimentally-manipulated task variability can be used to test predictions about the mechanisms that learners engage in dynamic decision making tasks.

[1]  Ward Edwards,et al.  Dynamic Decision Theory and Probabilistic Information Processings1 , 1962 .

[2]  Dana H. Ballard,et al.  Learning to perceive and act by trial and error , 1991, Machine Learning.

[3]  Prasad Tadepalli,et al.  Model-Based Average Reward Reinforcement Learning , 1998, Artif. Intell..

[4]  N. Smelser,et al.  International Encyclopedia of the Social and Behavioral Sciences , 2001 .

[5]  A. Tversky,et al.  On the psychology of prediction , 1973 .

[6]  Alban D. Sorensen,et al.  On Truth and Practice. , 1905 .

[7]  N. Daw,et al.  Reinforcement learning models of the dopamine system and their behavioral implications , 2003 .

[8]  Wayne D. Gray,et al.  Melioration Dominates Maximization: Stable Suboptimal Performance Despite Global Feedback , 2006 .

[9]  Jerome R. Busemeyer,et al.  An adaptive approach to human decision making: Learning theory, decision theory, and human performance. , 1992 .

[10]  B. Dosher,et al.  Characterizing human perceptual inefficiencies with equivalent internal noise. , 1999, Journal of the Optical Society of America. A, Optics, image science, and vision.

[11]  D. Gilbert,et al.  The correspondence bias. , 1995, Psychological bulletin.

[12]  Joel L. Davis,et al.  Adaptive Critics and the Basal Ganglia , 1995 .

[13]  Bradley C. Love,et al.  Short-term gains, long-term pains: How cues about state aid learning in dynamic environments , 2009, Cognition.

[14]  R. Mathews,et al.  Insight without Awareness: On the Interaction of Verbalization, Instruction and Practice in a Simulated Process Control Task , 1989 .

[15]  D. M. Green,et al.  Signal detection theory and psychophysics , 1966 .

[16]  R. Herrnstein,et al.  Melioration: A Theory of Distributed Choice , 1991 .

[17]  V. A. Harris,et al.  The Attribution of Attitudes , 1967 .

[18]  Jadin C. Jackson,et al.  Reconciling reinforcement learning models with behavioral extinction and renewal: implications for addiction, relapse, and problem gambling. , 2007, Psychological review.

[19]  R. Herrnstein Experiments on Stable Suboptimality in Individual Behavior , 1991 .

[20]  Chris Watkins,et al.  Learning from delayed rewards , 1989 .

[21]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[22]  Wayne D. Gray,et al.  Categorization and Reinforcement Learning: State Identification in Reinforcement Learning and Network Reinforcement Learning , 2007 .

[23]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[24]  Bernard Widrow,et al.  Adaptive switching circuits , 1988 .

[25]  P. Dayan,et al.  Cortical substrates for exploratory decisions in humans , 2006, Nature.

[26]  D. Broadbent,et al.  Interactive tasks and the implicit‐explicit distinction , 1988 .

[27]  J. Kruschke Bayesian approaches to associative learning: From passive to active learning , 2008, Learning & behavior.

[28]  J. Busemeyer,et al.  Learning Functional Relations Based on Experience With Input-Output Pairs by Humans and Artificial Neural Networks , 2005 .

[29]  David Elkind,et al.  Learning: An Introduction , 1968 .

[30]  R. Sun,et al.  The interaction of the explicit and the implicit in skill learning: a dual-process approach. , 2005, Psychological review.

[31]  Jason M. Gold,et al.  Characterizing perceptual learning with external noise , 2004, Cogn. Sci..

[32]  John R. Anderson,et al.  From recurrent choice to skill learning: a reinforcement-learning model. , 2006, Journal of experimental psychology. General.

[33]  I. Erev,et al.  Small feedback‐based decisions and their limited correspondence to description‐based decisions , 2003 .

[34]  Timothy J. Pleskac,et al.  Theoretical tools for understanding and aiding dynamic decision making , 2009 .

[35]  T. Başar,et al.  A New Approach to Linear Filtering and Prediction Problems , 2001 .

[36]  B. Brehmer Dynamic decision making: human control of complex systems. , 1992, Acta psychologica.

[37]  Benjamin Van Roy,et al.  Average cost temporal-difference learning , 1997, Proceedings of the 36th IEEE Conference on Decision and Control.

[38]  D. Shanks,et al.  A re‐examination of melioration and rational choice , 2002 .

[39]  A. Kacelnik Normative and descriptive models of decision making: time discounting and risk sensitivity. , 2007, Ciba Foundation symposium.

[40]  R. Luce,et al.  Individual Choice Behavior: A Theoretical Analysis. , 1960 .

[41]  Samuel M. McClure,et al.  Short-term memory traces for action bias in human reinforcement learning , 2007, Brain Research.

[42]  D G Pelli,et al.  Why use noise? , 1999, Journal of the Optical Society of America. A, Optics, image science, and vision.

[43]  B. Skinner 'Superstition' in the pigeon. 1948. , 1992, Journal of experimental psychology. General.

[44]  A. Inkeles,et al.  International Encyclopedia of the Social Sciences. , 1968 .

[45]  L. Green,et al.  Discounting of delayed rewards: Models of individual choice. , 1995, Journal of the experimental analysis of behavior.

[46]  John N. Tsitsiklis,et al.  Average cost temporal-difference learning , 1997, Proceedings of the 36th IEEE Conference on Decision and Control.

[47]  Prasad Tadepalli,et al.  Auto-Exploratory Average Reward Reinforcement Learning , 1996, AAAI/IAAI, Vol. 1.

[48]  Bruce D. Burns,et al.  Heuristics as beliefs and as behaviors: The adaptiveness of the “hot hand” , 2004, Cognitive Psychology.

[49]  Y. Niv THE EFFECTS OF MOTIVATION ON HABITUAL INSTRUMENTAL BEHAVIOR , 2007 .

[50]  Student BELIEF IN THE LAW OF SMALL NUMBERS , 1994 .

[51]  Arthur B Markman,et al.  Regulatory fit effects in a choice task , 2007, Psychonomic bulletin & review.

[52]  Jerker Denrell,et al.  Why most people disapprove of me: experience sampling in impression formation. , 2005, Psychological review.

[53]  Michael L. Littman,et al.  A tutorial on partially observable Markov decision processes , 2009 .

[54]  David S. Touretzky,et al.  Behavioral considerations suggest an average reward TD model of the dopamine system , 2000, Neurocomputing.

[55]  B. Skinner Superstition in the pigeon. , 1948, Journal of experimental psychology.

[56]  R. Duncan Luce,et al.  Individual Choice Behavior: A Theoretical Analysis , 1979 .

[57]  T. Marill Detection theory and psychophysics , 1956 .

[58]  Anton Schwartz,et al.  A Reinforcement Learning Method for Maximizing Undiscounted Rewards , 1993, ICML.

[59]  Grant C. Baldwin,et al.  A test of the regulatory fit hypothesis in perceptual classification learning , 2006, Memory & cognition.