Context-dependent decision-making: a simple Bayesian model

Many phenomena in animal learning can be explained by a context-learning process whereby an animal learns about different patterns of relationship between environmental variables. Differentiating between such environmental regimes or ‘contexts’ allows an animal to rapidly adapt its behaviour when context changes occur. The current work views animals as making sequential inferences about current context identity in a world assumed to be relatively stable but also capable of rapid switches to previously observed or entirely new contexts. We describe a novel decision-making model in which contexts are assumed to follow a Chinese restaurant process with inertia and full Bayesian inference is approximated by a sequential-sampling scheme in which only a single hypothesis about current context is maintained. Actions are selected via Thompson sampling, allowing uncertainty in parameters to drive exploration in a straightforward manner. The model is tested on simple two-alternative choice problems with switching reinforcement schedules and the results compared with rat behavioural data from a number of T-maze studies. The model successfully replicates a number of important behavioural effects: spontaneous recovery, the effect of partial reinforcement on extinction and reversal, the overtraining reversal effect, and serial reversal-learning effects.

[1]  W. R. Thompson ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .

[2]  E. Brunswik Probability as a determiner of rat behavior. , 1939 .

[3]  O. Mowrer,et al.  Habit strength as a function of the pattern of reinforcement , 1945 .

[4]  W. O. Jenkins,et al.  Partial reinforcement: a review and critique. , 1950, Psychological bulletin.

[5]  L. S. Reid,et al.  The development of noncontinuity behavior through continuity learning. , 1953, Journal of experimental psychology.

[6]  G. Kimble,et al.  One-trial discrimination reversal in the white rat. , 1954, Journal of comparative and physiological psychology.

[7]  B. H. Pubols The facilitation of visual and spatial discrimination reversal by overlearning. , 1956, Journal of comparative and physiological psychology.

[8]  W. N. Dember,et al.  Spontaneous alternation behavior. , 1958, Psychological bulletin.

[9]  D. J. Lewis,et al.  Partial reinforcement effects in a T maze. , 1959, Journal of comparative and physiological psychology.

[10]  Donald J. Lewis,et al.  Partial reinforcement: a selective review of the literature since 1950. , 1960, Psychological bulletin.

[11]  Effect of reward magnitude, percentage of reinforcement, and training method on acquisition and reversal in a T maze. , 1962, Journal of experimental psychology.

[12]  J. Davenport,et al.  The interaction of magnitude and delay of reinforcement in spatial discrimination. , 1962, Journal of comparative and physiological psychology.

[13]  W. F. Hill,et al.  Choice between magnitudes of reward in a T maze. , 1963, Journal of comparative and physiological psychology.

[14]  W. F. Hill,et al.  A replication of overlearning and reversal in a T maze. , 1963, Journal of experimental psychology.

[15]  K. Clayton Overlearning and Reversal of a Spatial Discrimination by Rats , 1963, Perceptual and motor skills.

[16]  K. Clayton T-MAZE CHOICE LEARNING AS A JOINT FUNCTION OF THE REWARD MAGNITUDES FOR THE ALTERNATIVES. , 1964, Journal of comparative and physiological psychology.

[17]  A. J. North,et al.  PROBABILITY LEARNING IN THE T MAZE WITH NONCORRECTION. , 1965, Journal of comparative and physiological psychology.

[18]  E. Lovejoy An attention theory of discrimination learning , 1965 .

[19]  J. Theios,et al.  OVERLEARNING REVERSAL EFFECT AND MAGNITUDE OF REWARD. , 1965, Journal of comparative and physiological psychology.

[20]  K. Clayton,et al.  T-maze acquisition and reversal as a function of intertrial interval. , 1966 .

[21]  E. Lovejoy,et al.  Analysis of the overlearning reversal effect. , 1966, Psychological review.

[22]  E. Fischer Conditioned Reflexes , 1942, American journal of physical medicine.

[23]  N. Mackintosh,et al.  Mechanisms of animal discrimination learning , 1971 .

[24]  C. Antoniak Mixtures of Dirichlet Processes with Applications to Bayesian Nonparametric Problems , 1974 .

[25]  N. Mackintosh The psychology of animal learning , 1974 .

[26]  R. May Thresholds and breakpoints in ecosystems with a multiplicity of stable states , 1977, Nature.

[27]  N. Mackintosh,et al.  Conditioning And Associative Learning , 1983 .

[28]  M. Bouton Differential control by context in the inflation and reinstatement paradigms. , 1984 .

[29]  D. Aldous Exchangeability and related topics , 1985 .

[30]  M. Bouton,et al.  A retrieval cue for extinction attenuates spontaneous recovery. , 1993, Journal of experimental psychology. Animal behavior processes.

[31]  J. E. Mazur,et al.  Past experience, recency, and spontaneous recovery in choice behavior , 1996 .

[32]  R. Rescorla Spontaneous recovery after training with multiple outcomes , 1996 .

[33]  L. Devenport Spontaneous recovery without interference: Why remembering is adaptive , 1998 .

[34]  S. Shettleworth Cognition, evolution, and behavior , 1998 .

[35]  Malcolm J. A. Strens,et al.  A Bayesian Framework for Reinforcement Learning , 2000, ICML.

[36]  S. Kakade,et al.  Learning and selective attention , 2000, Nature Neuroscience.

[37]  C. Gallistel,et al.  Time, rate, and conditioning. , 2000, Psychological review.

[38]  Carl E. Rasmussen,et al.  Factorial Hidden Markov Models , 1997 .

[39]  Nando de Freitas,et al.  Sequential Monte Carlo Methods in Practice , 2001, Statistics for Engineering and Information Science.

[40]  Mitsuo Kawato,et al.  MOSAIC Model for Sensorimotor Learning and Control , 2001, Neural Computation.

[41]  S. Carpenter,et al.  Catastrophic shifts in ecosystems , 2001, Nature.

[42]  Robert Lalonde,et al.  The neurobiological basis of spontaneous alternation , 2002, Neuroscience & Biobehavioral Reviews.

[43]  Marc W. Howard,et al.  A distributed representation of temporal context , 2002 .

[44]  R. Rescorla Spontaneous recovery. , 2004, Learning & memory.

[45]  M. Pelley The Role of Associative History in Models of Associative Learning: A Selective Review and a Hybrid Model: , 2004 .

[46]  M. L. Le Pelley The Role of Associative History in Models of Associative Learning: A Selective Review and a Hybrid Model , 2004, The Quarterly journal of experimental psychology. B, Comparative and physiological psychology.

[47]  J. Pitman Combinatorial Stochastic Processes , 2006 .

[48]  M. Bouton Learning and Behavior: A Contemporary Synthesis , 2006 .

[49]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[50]  N. Pillai,et al.  Bayesian density regression , 2007 .

[51]  Aaron C. Courville,et al.  The pigeon as particle filter , 2007, NIPS 2007.

[52]  Angela J. Yu,et al.  Should I stay or should I go? How the human brain manages the trade-off between exploitation and exploration , 2007, Philosophical Transactions of the Royal Society B: Biological Sciences.

[53]  Jadin C. Jackson,et al.  Reconciling reinforcement learning models with behavioral extinction and renewal: implications for addiction, relapse, and problem gambling. , 2007, Psychological review.

[54]  Alan Fern,et al.  Multi-task reinforcement learning: a hierarchical Bayesian approach , 2007, ICML '07.

[55]  David S. Touretzky,et al.  Context Learning in the Rodent Hippocampus , 2007, Neural Computation.

[56]  D. Blei,et al.  Context, learning, and extinction. , 2010, Psychological review.

[57]  Peter I. Frazier,et al.  Distance dependent Chinese restaurant processes , 2009, ICML.

[58]  Adam N Sanborn,et al.  Rational approximations to rational models: alternative algorithms for category learning. , 2010, Psychological review.

[59]  Thomas L. Griffiths,et al.  A Simple Sequential Algorithm for Approximating Bayesian Inference , 2011, CogSci.

[60]  Michael I. Jordan,et al.  A Sticky HDP-HMM With Application to Speaker Diarization , 2009, 0905.2592.

[61]  Lihong Li,et al.  An Empirical Evaluation of Thompson Sampling , 2011, NIPS.

[62]  W. Baum Extinction as discrimination: The molar view , 2012, Behavioural Processes.

[63]  E. Koechlin,et al.  Reasoning, Learning, and Creativity: Frontal Lobe Function and Human Decision-Making , 2012, PLoS biology.

[64]  David S. Leslie,et al.  Optimistic Bayesian Sampling in Contextual-Bandit Problems , 2012, J. Mach. Learn. Res..

[65]  Thomas L. Griffiths,et al.  One and Done? Optimal Decisions From Very Few Samples , 2014, Cogn. Sci..