Structured, uncertainty-driven exploration in real-world consumer choice

Significance We study how people make choices among a large number of options when they have limited experience. In a large dataset of online food delivery purchases, we find evidence for sophisticated exploration strategies predicted by contemporary theories. People actively seek to reduce their uncertainty about restaurants and use similarity-based generalization to guide their selections. Our findings suggest that theories of exploratory choice have real-world validity. Making good decisions requires people to appropriately explore their available options and generalize what they have learned. While computational models can explain exploratory behavior in constrained laboratory tasks, it is unclear to what extent these models generalize to real-world choice problems. We investigate the factors guiding exploratory behavior in a dataset consisting of 195,333 customers placing 1,613,967 orders from a large online food delivery service. We find important hallmarks of adaptive exploration and generalization, which we analyze using computational models. In particular, customers seem to engage in uncertainty-directed exploration and use feature-based generalization to guide their exploration. Our results provide evidence that people use sophisticated strategies to explore complex, real-world environments.

[1]  Ben R. Newell,et al.  Unpacking the Exploration–Exploitation Tradeoff: A Synthesis of Human and Animal Literatures , 2015 .

[2]  Bernhard Schölkopf,et al.  Causal discovery with continuous additive noise models , 2013, J. Mach. Learn. Res..

[3]  M. Speekenbrink,et al.  Putting bandits into context: How function learning supports decision making , 2016, bioRxiv.

[4]  Maarten Speekenbrink,et al.  Uncertainty and Exploration in a Restless Bandit Problem , 2015, Top. Cogn. Sci..

[5]  Peter Dayan,et al.  A Neural Substrate of Prediction and Reward , 1997, Science.

[6]  N. Daw,et al.  Generalization of value in reinforcement learning by humans , 2012, The European journal of neuroscience.

[7]  R. Dhar,et al.  Making Complementary Choices in Consumption Episodes: Highlighting versus Balancing: , 1999 .

[8]  M. D’Esposito,et al.  Frontal Cortex and the Discovery of Abstract Action Rules , 2010, Neuron.

[9]  Andreas Krause,et al.  Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2009, IEEE Transactions on Information Theory.

[10]  Jonathan D. Cohen,et al.  Humans use directed and random exploration to solve the explore-exploit dilemma. , 2014, Journal of experimental psychology. General.

[11]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[12]  KrauseAndreas,et al.  Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2012 .

[13]  M. Speekenbrink,et al.  Putting bandits into context: How function learning supports decision making , 2016, bioRxiv.

[14]  M. Frank,et al.  Prefrontal and striatal dopaminergic genes predict individual differences in exploration and exploitation. , 2009, Nature neuroscience.

[15]  Angela J. Yu,et al.  Should I stay or should I go? How the human brain manages the trade-off between exploitation and exploration , 2007, Philosophical Transactions of the Royal Society B: Biological Sciences.

[16]  Eric Schulz,et al.  Generalization guides human exploration in vast decision spaces , 2018 .

[17]  P. Stone,et al.  The Nature of Belief-Directed Exploratory Choice in Human Decision-Making , 2011, Front. Psychology.

[18]  Peter Auer,et al.  Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[19]  Samuel Gershman,et al.  Novelty and Inductive Generalization in Human Reinforcement Learning , 2015, Top. Cogn. Sci..

[20]  Human behaviour: Shoppers like what they know , 2017, Nature.

[21]  Rahul Bhui,et al.  Case-Based Decision Neuroscience: Economic Judgment by Similarity , 2018 .

[22]  Samuel J. Gershman,et al.  The algorithmic architecture of exploration in the human brain , 2019, Current Opinion in Neurobiology.

[23]  Bradley C. Love,et al.  Coherency-maximizing exploration in the supermarket , 2017, Nature Human Behaviour.

[24]  Richard M Shiffrin,et al.  Drawing causal inference from Big Data , 2016, Proceedings of the National Academy of Sciences.

[25]  P. Todd,et al.  Explaining social learning of food preferences without aversions: an evolutionary simulation model of Norway rats , 2001, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[26]  S. Gershman Deconstructing the human algorithms for exploration , 2018, Cognition.

[27]  P. Dayan,et al.  Cortical substrates for exploratory decisions in humans , 2006, Nature.

[28]  Robert L. Goldstone Similarity, interactive activation, and mapping , 1994 .

[29]  T. Griffiths Manifesto for a new (computational) cognitive revolution , 2015, Cognition.

[30]  Samuel J. Gershman,et al.  Finding structure in multi-armed bandits , 2018 .

[31]  P. Whittle Multi‐Armed Bandits and the Gittins Index , 1980 .