Searching for rewards in graph-structured spaces

How do people generalize and explore structured spaces? We study human behavior on a multi-armed bandit task, where rewards are influenced by the connectivity structure of a graph. A detailed predictive model comparison shows that a Gaussian Process regression model using a diffusion kernel is able to best describe participant choices, and also predict judgments about expected reward and confidence. This model unifies psychological models of function learning with the Successor Representation used in reinforcement learning, thereby building a bridge between different models of generalization.

[1]  Peter Dayan,et al.  Improving Generalization for Temporal Difference Learning: The Successor Representation , 1993, Neural Computation.

[2]  Jonathan D. Nelson,et al.  Generalization guides human exploration in vast decision spaces , 2017, Nature Human Behaviour.

[3]  Samuel Gershman,et al.  A Unifying Probabilistic View of Associative Learning , 2015, PLoS Comput. Biol..

[4]  Charley M. Wu,et al.  Searching for Rewards Like a Child Means Less Generalization and More Directed Exploration , 2018, bioRxiv.

[5]  R. Rescorla,et al.  A theory of Pavlovian conditioning : Variations in the effectiveness of reinforcement and nonreinforcement , 1972 .

[6]  Samuel Gershman,et al.  Design Principles of the Hippocampal Cognitive Map , 2014, NIPS.

[7]  Peter Auer,et al.  Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[8]  J. Tenenbaum,et al.  Structured statistical models of inductive reasoning. , 2009, Psychological review.

[9]  Alexander J. Smola,et al.  Kernels and Regularization on Graphs , 2003, COLT.

[10]  Björn Meder,et al.  Connecting conceptual and spatial search via a model of generalization , 2018, bioRxiv.

[11]  Marlos C. Machado,et al.  Eigenoption Discovery through the Deep Successor Representation , 2017, ICLR.

[12]  Samuel J. Gershman,et al.  Generalization as diffusion: human function learning on graphs , 2019, bioRxiv.

[13]  N. Daw,et al.  Human Reinforcement Learning Subdivides Structured Action Spaces by Learning Effector-Specific Values , 2009, The Journal of Neuroscience.

[14]  John D. Lafferty,et al.  Diffusion Kernels on Graphs and Other Discrete Input Spaces , 2002, ICML.

[15]  M. Speekenbrink,et al.  It's new, but is it good? How generalization and uncertainty guide the exploration of novel options. , 2018, Journal of experimental psychology. General.