Bayesian deterministic decision making: a normative account of the operant matching law and heavy-tailed reward history dependency of choices

The decision making behaviors of humans and animals adapt and then satisfy an “operant matching law” in certain type of tasks. This was first pointed out by Herrnstein in his foraging experiments on pigeons. The matching law has been one landmark for elucidating the underlying processes of decision making and its learning in the brain. An interesting question is whether decisions are made deterministically or probabilistically. Conventional learning models of the matching law are based on the latter idea; they assume that subjects learn choice probabilities of respective alternatives and decide stochastically with the probabilities. However, it is unknown whether the matching law can be accounted for by a deterministic strategy or not. To answer this question, we propose several deterministic Bayesian decision making models that have certain incorrect beliefs about an environment. We claim that a simple model produces behavior satisfying the matching law in static settings of a foraging task but not in dynamic settings. We found that the model that has a belief that the environment is volatile works well in the dynamic foraging task and exhibits undermatching, which is a slight deviation from the matching law observed in many experiments. This model also demonstrates the double-exponential reward history dependency of a choice and a heavier-tailed run-length distribution, as has recently been reported in experiments on monkeys.

[1]  R J HERRNSTEIN,et al.  Relative and absolute strength of response as a function of frequency of reinforcement. , 1961, Journal of the experimental analysis of behavior.

[2]  Anders Ledberg,et al.  Neurobiological Models of Two-Choice Decision Making Can Be Reduced to a One-Dimensional Nonlinear Diffusion Equation , 2008, PLoS Comput. Biol..

[3]  H. Seung,et al.  JOURNAL OF THE EXPERIMENTAL ANALYSIS OF BEHAVIOR 2005, 84, 581–617 NUMBER 3(NOVEMBER) LINEAR-NONLINEAR-POISSON MODELS OF PRIMATE CHOICE DYNAMICS , 2022 .

[4]  P. Glimcher,et al.  JOURNAL OF THE EXPERIMENTAL ANALYSIS OF BEHAVIOR 2005, 84, 555–579 NUMBER 3(NOVEMBER) DYNAMIC RESPONSE-BY-RESPONSE MODELS OF MATCHING BEHAVIOR IN RHESUS MONKEYS , 2022 .

[5]  E. Miller,et al.  A Neural Circuit Model of Flexible Sensorimotor Mapping: Learning and Forgetting on Multiple Timescales , 2007, Neuron.

[6]  Yonatan Loewenstein,et al.  Operant matching is a generic outcome of synaptic plasticity based on the covariance between reward and neural activity , 2006, Proceedings of the National Academy of Sciences.

[7]  W. Newsome,et al.  Matching Behavior and the Representation of Value in the Parietal Cortex , 2004, Science.

[8]  C. Gallistel Foraging for brain stimulation: toward a neurobiology of computation , 1994, Cognition.

[9]  R. Luce,et al.  Operant matching is not a logical consequence of maximizing reinforcement rate , 1979 .

[10]  W. Baum,et al.  Matching, undermatching, and overmatching in studies of choice. , 1979, Journal of the experimental analysis of behavior.

[11]  Herbert Jaeger,et al.  Optimization and applications of echo state networks with leaky- integrator neurons , 2007, Neural Networks.

[12]  Xiao-Jing Wang,et al.  A Biophysically Based Neural Model of Matching Law Behavior: Melioration by Stochastic Synapses , 2006, The Journal of Neuroscience.

[13]  Yutaka Sakai,et al.  The Actor-Critic Learning Is Behind the Matching Law: Matching Versus Optimal Behaviors , 2008, Neural Computation.

[14]  J E Staddon,et al.  Matching, maximizing, and hill-climbing. , 1983, Journal of the experimental analysis of behavior.

[15]  W. Baum,et al.  Choice, contingency discrimination, and foraging theory. , 1999, Journal of the experimental analysis of behavior.

[16]  Jonathan D. Cohen,et al.  Sequential effects: Superstition or rational behavior? , 2008, NIPS.

[17]  Jonathan D. Cohen,et al.  Explicit melioration by a neural diffusion model , 2009, Brain Research.

[18]  Masato Okada,et al.  Statistical Mechanics of Reward-Modulated Learning in Decision-Making Networks , 2012, Neural Computation.

[19]  M. Davison,et al.  The matching law: A research review. , 1988 .

[20]  W M Baum,et al.  On two types of deviation from the matching law: bias and undermatching. , 1974, Journal of the experimental analysis of behavior.

[21]  Yonatan Loewenstein,et al.  Robustness of Learning That Is Based on Covariance-Driven Synaptic Plasticity , 2008, PLoS Comput. Biol..

[22]  E. M.,et al.  Statistical Mechanics , 2021, Manual for Theoretical Chemistry.

[23]  R. Herrnstein,et al.  Toward a law of response strength. , 1976 .

[24]  Yutaka Sakai,et al.  When Does Reward Maximization Lead to Matching Law? , 2008, PloS one.

[25]  R. Herrnstein,et al.  Toward a law of response strength. , 1976 .

[26]  W M Baum,et al.  Optimization and the matching law as accounts of instrumental behavior. , 1981, Journal of the experimental analysis of behavior.

[27]  W. Woolverton,et al.  The generalized matching law as a predictor of choice between cocaine and food in rhesus monkeys , 2002, Psychopharmacology.

[28]  W Vaughan,et al.  Melioration, matching, and maximization. , 1981, Journal of the experimental analysis of behavior.

[29]  W M Baum,et al.  Choice as time allocation. , 1969, Journal of the experimental analysis of behavior.

[30]  W. Newsome,et al.  Choosing the greater of two goods: neural currencies for valuation and decision making , 2005, Nature Reviews Neuroscience.

[31]  H. Seo,et al.  A reservoir of time constants for memory traces in cortical neurons , 2011, Nature Neuroscience.

[32]  W M Baum,et al.  Choice, changeover, and travel. , 1982, Journal of the experimental analysis of behavior.