Deterministic response strategies in trial-and-error learning

Trial-and-error learning is a universal strategy for establishing which actions are beneficial or harmful in new environments. However, learning stimulus-response associations solely via trial-and-error is often suboptimal, as in many settings dependencies among stimuli and responses can be exploited to increase learning efficiency. Previous studies have shown that in settings featuring such dependencies, humans typically engage high-level cognitive processes and employ advanced learning strategies to improve their learning efficiency. Here we analyze in detail the initial learning phase of a sample of human subjects (N = 85) performing a trial-and-error learning task with deterministic feedback and hidden stimulus-response dependencies. Using computational modeling, we find that the standard Q-learning model cannot sufficiently explain human learning strategies in this setting. Instead, newly introduced deterministic response models, which are theoretically optimal and transform stimulus sequences unambiguously into response sequences, provide the best explanation for 50.6% of the subjects. Most of the remaining subjects either show a tendency towards generic optimal learning (21.2%) or at least partially exploit stimulus-response dependencies (22.3%), while a few subjects (5.9%) show no clear preference for any of the employed models. After the initial learning phase, asymptotic learning performance during the subsequent practice phase is best explained by the standard Q-learning model. Our results show that human learning strategies in trial-and-error learning go beyond merely associating stimuli and responses via incremental reinforcement. Specifically during initial learning, high-level cognitive processes support sophisticated learning strategies that increase learning efficiency while keeping memory demands and computational efforts bounded. The good asymptotic fit of the Q-learning model indicates that these cognitive processes are successively replaced by the formation of stimulus-response associations over the course of learning.

[1]  Andrea Brovelli,et al.  Understanding the neural computations of arbitrary visuomotor learning through fMRI and associative learning theory. , 2008, Cerebral cortex.

[2]  Mehdi Khamassi,et al.  Modeling choice and reaction time during arbitrary visuomotor learning through the coordination of adaptive working memory and reinforcement learning , 2015, Front. Behav. Neurosci..

[3]  Michael W. Cole,et al.  The task novelty paradox: Flexible control of inflexible neural pathways during rapid instructed task learning , 2017, Neuroscience & Biobehavioral Reviews.

[4]  Hannes Ruge,et al.  Frontostriatal Mechanisms in Instruction-Based Learning as a Hallmark of Flexible Goal-Directed Behavior , 2012, Front. Psychology.

[5]  J. Gläscher,et al.  Determining a role for ventromedial prefrontal cortex in encoding action-based value signals during reward-related decision making. , 2009, Cerebral cortex.

[6]  Hannes Ruge,et al.  Towards an understanding of the neural dynamics of intentional learning: Considering the timescale , 2016, NeuroImage.

[7]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[8]  Jonathan D. Power,et al.  Multi-task connectivity reveals flexible hubs for adaptive task control , 2013, Nature Neuroscience.

[9]  Anne G E Collins,et al.  How much of reinforcement learning is working memory, not reinforcement learning? A behavioral, computational, and neurogenetic analysis , 2012, The European journal of neuroscience.

[10]  David Badre,et al.  Working Memory Load Strengthens Reward Prediction Errors , 2017, The Journal of Neuroscience.

[11]  Yuan Chang Leong,et al.  Dynamic Interaction between Reinforcement Learning and Attention in Multidimensional Environments , 2017, Neuron.

[12]  Anne G E Collins,et al.  Cognitive control over learning: creating, clustering, and generalizing task-set structure. , 2013, Psychological review.

[13]  E. Thorndike “Animal Intelligence” , 1898, Nature.

[14]  Hannes Ruge,et al.  The neural basis of integrating pre- and post-response information for goal-directed actions , 2016, Neuropsychologia.

[15]  E. Koechlin,et al.  The Importance of Falsification in Computational Cognitive Modeling , 2017, Trends in Cognitive Sciences.

[16]  J. O'Doherty,et al.  The Role of the Ventromedial Prefrontal Cortex in Abstract State-Based Inference during Decision Making in Humans , 2006, The Journal of Neuroscience.

[17]  Hannes Ruge,et al.  Large-scale coupling dynamics of instructed reversal learning , 2018, NeuroImage.

[18]  Timothy E. J. Behrens,et al.  Learning the value of information in an uncertain world , 2007, Nature Neuroscience.

[19]  G. Pourtois,et al.  Effects of positive mood on probabilistic learning: Behavioral and electrophysiological correlates , 2014, Biological Psychology.

[20]  Hannes Ruge,et al.  On the efficiency of instruction-based rule encoding. , 2017, Acta psychologica.

[21]  Michael W. Cole,et al.  Rapid instructed task learning: A new window into the human brain’s unique capacity for flexible cognitive control , 2013, Cognitive, affective & behavioral neuroscience.

[22]  Olaf Sporns,et al.  Integration and segregation of large-scale brain networks during short-term task automatization , 2016, Nature Communications.

[23]  Etienne Koechlin,et al.  Foundations of human reasoning in the prefrontal cortex , 2014, Science.

[24]  Michael W. Cole,et al.  Prefrontal Dynamics Underlying Rapid Instructed Task Learning Reverse with Practice , 2010, The Journal of Neuroscience.

[25]  R. Rescorla Variation in the effectiveness of reinforcement and nonreinforcement following prior inhibitory conditioning , 1971 .

[26]  Hannes Ruge,et al.  Sparse regularization techniques provide novel insights into outcome integration processes , 2015, NeuroImage.

[27]  Hannes Ruge,et al.  Distinct contributions of lateral orbito-frontal cortex, striatum, and fronto-parietal network regions for rule encoding and control of memory-based implementation during instructed reversal learning , 2016, NeuroImage.

[28]  Timothy E. J. Behrens,et al.  Counterfactual Choice and Learning in a Neural Network Centered on Human Lateral Frontopolar Cortex , 2011, PLoS biology.

[29]  Anne Gabrielle Eva Collins,et al.  Motor Demands Constrain Cognitive Rule Structures , 2016, PLoS Comput. Biol..

[30]  Anne G E Collins,et al.  Working Memory Contributions to Reinforcement Learning Impairments in Schizophrenia , 2014, The Journal of Neuroscience.

[31]  Christopher J. Mitchell,et al.  Attention and associative learning in humans: An integrative review. , 2016, Psychological bulletin.

[32]  Michael W. Cole,et al.  The Behavioral Relevance of Task Information in Human Prefrontal Cortex. , 2016, Cerebral cortex.

[33]  Hannes Ruge,et al.  Functional integration processes underlying the instruction-based learning of novel goal-directed behaviors , 2013, NeuroImage.

[34]  Michael W. Cole,et al.  The power of instructions: Proactive configuration of stimulus-response translation. , 2015, Journal of experimental psychology. Learning, memory, and cognition.

[35]  Hannes Ruge,et al.  Rapid formation of pragmatic rule representations in the human brain during instruction-based learning. , 2010, Cerebral cortex.

[36]  Shinsuke Shimojo,et al.  Neural Computations Mediating One-Shot Learning in the Human Brain , 2013, PLoS biology.

[37]  E. Koechlin,et al.  Reasoning, Learning, and Creativity: Frontal Lobe Function and Human Decision-Making , 2012, PLoS biology.

[38]  N. Daw,et al.  Signals in Human Striatum Are Appropriate for Policy Update Rather than Value Prediction , 2011, The Journal of Neuroscience.

[39]  J. O'Doherty,et al.  Contributions of the Amygdala to Reward Expectancy and Choice Signals in Human Prefrontal Cortex , 2007, Neuron.

[40]  Timothy Edward John Behrens,et al.  How Green Is the Grass on the Other Side? Frontopolar Cortex and the Evidence in Favor of Alternative Courses of Action , 2009, Neuron.

[41]  R. Rescorla,et al.  A theory of Pavlovian conditioning : Variations in the effectiveness of reinforcement and nonreinforcement , 1972 .

[42]  Eduardo F. Morales,et al.  An Introduction to Reinforcement Learning , 2011 .

[43]  Anne Gabrielle Eva Collins,et al.  The Cost of Structure Learning , 2017, Journal of Cognitive Neuroscience.

[44]  Mark W Woolrich,et al.  Associative learning of social value , 2008, Nature.