Meta-learning, social cognition and consciousness in brains and machines

The intersection between neuroscience and artificial intelligence (AI) research has created synergistic effects in both fields. While neuroscientific discoveries have inspired the development of AI architectures, new ideas and algorithms from AI research have produced new ways to study brain mechanisms. A well-known example is the case of reinforcement learning (RL), which has stimulated neuroscience research on how animals learn to adjust their behavior to maximize reward. In this review article, we cover recent collaborative work between the two fields in the context of meta-learning and its extension to social cognition and consciousness. Meta-learning refers to the ability to learn how to learn, such as learning to adjust hyperparameters of existing learning algorithms and how to use existing models and knowledge to efficiently solve new tasks. This meta-learning capability is important for making existing AI systems more adaptive and flexible to efficiently solve new tasks. Since this is one of the areas where there is a gap between human performance and current AI systems, successful collaboration should produce new ideas and progress. Starting from the role of RL algorithms in driving neuroscience, we discuss recent developments in deep RL applied to modeling prefrontal cortex functions. Even from a broader perspective, we discuss the similarities and differences between social cognition and meta-learning, and finally conclude with speculations on the potential links between intelligence as endowed by model-based RL and consciousness. For future work we highlight data efficiency, autonomy and intrinsic motivation as key research areas for advancing both fields.

[1]  A. Barto,et al.  Intrinsic motivations and open-ended development in animals, humans, and robots: an overview , 2014, Front. Psychol..

[2]  O. Hikosaka,et al.  Dopamine Neurons Can Represent Context-Dependent Prediction Error , 2004, Neuron.

[3]  Anne G E Collins,et al.  How much of reinforcement learning is working memory, not reinforcement learning? A behavioral, computational, and neurogenetic analysis , 2012, The European journal of neuroscience.

[4]  J. Rilling,et al.  The neuroscience of social decision-making. , 2011, Annual review of psychology.

[5]  Zeb Kurth-Nelson,et al.  Deep Reinforcement Learning and Its Neuroscientific Implications , 2020, Neuron.

[6]  Sepp Hochreiter,et al.  Learning to Learn Using Gradient Descent , 2001, ICANN.

[7]  S Dehaene,et al.  A neuronal model of a global workspace in effortful cognitive tasks. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[8]  R. Rescorla,et al.  A theory of Pavlovian conditioning : Variations in the effectiveness of reinforcement and nonreinforcement , 1972 .

[9]  L. Jakobson,et al.  A neurological dissociation between perceiving objects and grasping them , 1991, Nature.

[10]  Richard S. Sutton,et al.  Time-Derivative Models of Pavlovian Reinforcement , 1990 .

[11]  P. Dayan,et al.  Model-based and model-free Pavlovian reward learning: Revaluation, revision, and revelation , 2014, Cognitive, affective & behavioral neuroscience.

[12]  Demis Hassabis,et al.  Mastering Atari, Go, chess and shogi by planning with a learned model , 2019, Nature.

[13]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Dorothy Tse,et al.  References and Notes Supporting Online Material Materials and Methods Figs. S1 to S5 Tables S1 to S3 Electron Impact (ei) Mass Spectra Chemical Ionization (ci) Mass Spectra References Schemas and Memory Consolidation Research Articles Research Articles Research Articles Research Articles , 2022 .

[15]  W. Newsome,et al.  The temporal precision of reward prediction in dopamine neurons , 2008, Nature Neuroscience.

[16]  Justin L. Gardner,et al.  Learning to Simulate Others' Decisions , 2012, Neuron.

[17]  E. Fehr,et al.  Social Preferences and the Brain , 2014 .

[18]  Y. Niv,et al.  Model-based predictions for dopamine , 2018, Current Opinion in Neurobiology.

[19]  S. Gershman,et al.  Dopamine reward prediction errors reflect hidden state inference across time , 2017, Nature Neuroscience.

[20]  R. Dolan,et al.  Prefrontal Contributions to Metacognition in Perceptual Decision Making , 2012, The Journal of Neuroscience.

[21]  P. Dayan,et al.  A framework for mesencephalic dopamine systems based on predictive Hebbian learning , 1996, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[22]  J. Hollerman,et al.  Dopamine neurons report an error in the temporal prediction of reward during learning , 1998, Nature Neuroscience.

[23]  Peter Dayan,et al.  Simple Plans or Sophisticated Habits? State, Transition and Learning Interactions in the Two-Step Task , 2015, bioRxiv.

[24]  Matthew F S Rushworth,et al.  The Computation of Social Behavior , 2009, Science.

[25]  R. Clark,et al.  Classical conditioning, awareness, and brain systems , 2002, Trends in Cognitive Sciences.

[26]  Peter Dayan,et al.  Models and Methods for Reinforcement Learning , 2018 .

[27]  P. Dayan Twenty-Five Lessons from Computational Neuromodulation , 2012, Neuron.

[28]  Hyoung F. Kim,et al.  Parallel basal ganglia circuits for voluntary and automatic behaviour to reach rewards. , 2015, Brain : a journal of neurology.

[29]  Brian Knutson,et al.  When Giving Is Good: Ventromedial Prefrontal Cortex Activation for Others' Intentions , 2010, Neuron.

[30]  Ricardo Vilalta,et al.  A Perspective View and Survey of Meta-Learning , 2002, Artificial Intelligence Review.

[31]  Jonathan D. Cohen,et al.  Imaging valuation models in human choice. , 2006, Annual review of neuroscience.

[32]  H. Nakahara Multiplexing signals in reinforcement learning with internal models and dopamine , 2014, Current Opinion in Neurobiology.

[33]  Charan Ranganath,et al.  Map Making: Constructing, Combining, and Inferring on Abstract Cognitive Maps , 2020, Neuron.

[34]  Y. Niv Learning task-state representations , 2019, Nature Neuroscience.

[35]  Christopher H Chatham,et al.  Multiple gates on working memory , 2015, Current Opinion in Behavioral Sciences.

[36]  N. Daw,et al.  Reinforcement Learning and Episodic Memory in Humans and Animals: An Integrative Framework , 2017, Annual review of psychology.

[37]  Shunzheng Yu,et al.  Hidden semi-Markov models , 2010, Artif. Intell..

[38]  Christopher Summerfield,et al.  Metacognition in human decision-making: confidence and error monitoring , 2012, Philosophical Transactions of the Royal Society B: Biological Sciences.

[39]  Pierre-Yves Oudeyer,et al.  What is Intrinsic Motivation? A Typology of Computational Approaches , 2007, Frontiers Neurorobotics.

[40]  Geoffrey Schoenbaum,et al.  Rethinking dopamine as generalized prediction error , 2018, bioRxiv.

[41]  Joshua B. Tenenbaum,et al.  Building machines that learn and think like people , 2016, Behavioral and Brain Sciences.

[42]  A. Tversky,et al.  Judgment under Uncertainty: Heuristics and Biases , 1974, Science.

[43]  O. Hikosaka,et al.  Learning to represent reward structure: A key to adapting to complex environments , 2012, Neuroscience Research.

[44]  Marcus Hutter,et al.  On the computability of Solomonoff induction and AIXI , 2017, Theor. Comput. Sci..

[45]  Karl J. Friston,et al.  Computational psychiatry , 2012, Trends in Cognitive Sciences.

[46]  Greg Wayne,et al.  Hierarchical motor control in mammals and machines , 2019, Nature Communications.

[47]  S. Dehaene,et al.  What is consciousness, and could machines have it? , 2017, Science.

[48]  Joel Z. Leibo,et al.  Prefrontal cortex as a meta-reinforcement learning system , 2018, bioRxiv.

[49]  Ryota Kanai,et al.  Deep learning and the Global Workspace Theory , 2020, Trends in Neurosciences.

[50]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[51]  Nicholas Shea,et al.  The Global Workspace Needs Metacognition , 2019, Trends in Cognitive Sciences.

[52]  D. Berlyne Curiosity and exploration. , 1966, Science.

[53]  Kenji Doya,et al.  Metalearning and neuromodulation , 2002, Neural Networks.

[54]  B. Baars IN THE THEATRE OF CONSCIOUSNESS Global Workspace Theory, A Rigorous Scientific Theory of Consciousness. , 1997 .

[55]  W. Schultz,et al.  Neural mechanisms of observational learning , 2010, Proceedings of the National Academy of Sciences.

[56]  L. Jakobson,et al.  Differences in the visual control of pantomimed and natural grasping movements , 1994, Neuropsychologia.

[57]  Richard S. Sutton,et al.  Stimulus Representation and the Timing of Reward-Prediction Errors in Models of the Dopamine System , 2008, Neural Computation.

[58]  Peter Dayan,et al.  Improving Generalization for Temporal Difference Learning: The Successor Representation , 1993, Neural Computation.

[59]  S. Gershman,et al.  The Medial Prefrontal Cortex Shapes Dopamine Reward Prediction Errors under State Uncertainty , 2018, Neuron.

[60]  N. Uchida,et al.  Neural Circuitry of Reward Prediction Error. , 2017, Annual review of neuroscience.

[61]  Mingyu Song,et al.  Uncovering the ‘state’: Tracing the hidden state representations that structure learning and decision-making , 2019, Behavioural Processes.

[62]  R. Clark,et al.  Classical conditioning and brain systems: the role of awareness. , 1998, Science.

[63]  Wojciech M. Czarnecki,et al.  Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.

[64]  B. Baars Global workspace theory of consciousness: toward a cognitive neuroscience of human experience. , 2005, Progress in brain research.

[65]  Naoshige Uchida,et al.  Arithmetic and local circuitry underlying dopamine prediction errors , 2015, Nature.

[66]  C. Heyes,et al.  Metamemory as evidence of animal consciousness: the type that does the trick , 2009, Biology & philosophy.

[67]  John P. O'Doherty,et al.  Human Dorsal Striatum Encodes Prediction Errors during Observational Learning of Instrumental Actions , 2012, Journal of Cognitive Neuroscience.

[68]  Peter Dayan,et al.  A Neural Substrate of Prediction and Reward , 1997, Science.

[69]  Jane X. Wang,et al.  Reinforcement Learning, Fast and Slow , 2019, Trends in Cognitive Sciences.

[70]  Chrystopher L. Nehaniv,et al.  Empowerment: a universal agent-centric measure of control , 2005, 2005 IEEE Congress on Evolutionary Computation.

[71]  Geoffrey Schoenbaum,et al.  Midbrain dopamine neurons compute inferred and cached value prediction errors in a common framework , 2016, eLife.

[72]  Rajesh P. N. Rao,et al.  Decision Making Under Uncertainty: A Neural Model Based on Partially Observable Markov Decision Processes , 2010, Front. Comput. Neurosci..

[73]  H. T. Nguyen,et al.  The role of awareness in delay and trace fear conditioning in humans , 2006, Cognitive, affective & behavioral neuroscience.

[74]  Guy Lever,et al.  Human-level performance in 3D multiplayer games with population-based reinforcement learning , 2018, Science.

[75]  Thomas Akam,et al.  What is dopamine doing in model-based reinforcement learning? , 2021, Current Opinion in Behavioral Sciences.

[76]  Bogdan Gabrys,et al.  Metalearning: a survey of trends and technologies , 2013, Artificial Intelligence Review.

[77]  Hannah M. Batchelor,et al.  Dopamine Neurons Respond to Errors in the Prediction of Sensory Features of Expected Rewards , 2017, Neuron.

[78]  R. Kanai,et al.  Information generation as a functional basis of consciousness , 2019, Neuroscience of consciousness.

[79]  Justin L. Gardner,et al.  Computing Social Value Conversion in the Human Brain , 2019, The Journal of Neuroscience.

[80]  Y. Niv,et al.  Temporal Specificity of Reward Prediction Errors Signaled by Putative Dopamine Neurons in Rat VTA Depends on Ventral Striatum , 2016, Neuron.

[81]  H. Harlow,et al.  The formation of learning sets. , 1949, Psychological review.