A Mixture of Delta-Rules Approximation to Bayesian Inference in Change-Point Problems

Error-driven learning rules have received considerable attention because of their close relationships to both optimal theory and neurobiological mechanisms. However, basic forms of these rules are effective under only a restricted set of conditions in which the environment is stable. Recent studies have defined optimal solutions to learning problems in more general, potentially unstable, environments, but the relevance of these complex mathematical solutions to how the brain solves these problems remains unclear. Here, we show that one such Bayesian solution can be approximated by a computationally straightforward mixture of simple error-driven ‘Delta’ rules. This simpler model can make effective inferences in a dynamic environment and matches human performance on a predictive-inference task using a mixture of a small number of Delta rules. This model represents an important conceptual advance in our understanding of how the brain can use relatively simple computations to make nearly optimal inferences in a dynamic world.

[1]  D. Heeger Modeling simple-cell direction selectivity with normalized, half-squared, linear operators. , 1993, Journal of neurophysiology.

[2]  J. Gold,et al.  The neural basis of decision making. , 2007, Annual review of neuroscience.

[3]  John M. Pearson,et al.  Neuronal basis of sequential foraging decisions in a patchy environment , 2011, Nature Neuroscience.

[4]  Peter N. C. Mohr,et al.  Genetic variation in dopaminergic neuromodulation influences the ability to rapidly and flexibly adapt decisions , 2009, Proceedings of the National Academy of Sciences.

[5]  Karl J. Friston,et al.  Bayesian model selection for group studies , 2009, NeuroImage.

[6]  N. Daw,et al.  Differential roles of human striatum and amygdala in associative learning , 2011, Nature Neuroscience.

[7]  Angela J. Yu,et al.  Uncertainty, Neuromodulation, and Attention , 2005, Neuron.

[8]  Anne G E Collins,et al.  How much of reinforcement learning is working memory, not reinforcement learning? A behavioral, computational, and neurogenetic analysis , 2012, The European journal of neuroscience.

[9]  Joshua I. Gold,et al.  Bayesian Online Learning of the Hazard Rate in Change-Point Problems , 2010, Neural Computation.

[10]  Ryan P. Adams,et al.  Bayesian Online Changepoint Detection , 2007, 0710.3742.

[11]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[12]  Joshua W. Brown,et al.  Learned Predictions of Error Likelihood in the Anterior Cingulate Cortex , 2005, Science.

[13]  Peter Dayan,et al.  A Neural Substrate of Prediction and Reward , 1997, Science.

[14]  N. Daw,et al.  Dissociating hippocampal and striatal contributions to sequential prediction learning , 2012, The European journal of neuroscience.

[15]  A. Engel,et al.  Trial-by-Trial Coupling of Concurrent Electroencephalogram and Functional Magnetic Resonance Imaging Identifies the Dynamics of Performance Monitoring , 2005, The Journal of Neuroscience.

[16]  C. Law,et al.  Neural correlates of perceptual learning in a sensory-motor, but not a sensory, cortical area , 2008, Nature Neuroscience.

[17]  Clay B. Holroyd,et al.  The neural basis of human error processing: reinforcement learning, dopamine, and the error-related negativity. , 2002, Psychological review.

[18]  J. Gläscher,et al.  Formal Learning Theory Dissociates Brain Regions with Different Temporal Integration , 2005, Neuron.

[19]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[20]  Thomas L. Griffiths,et al.  One and Done? Optimal Decisions From Very Few Samples , 2014, Cogn. Sci..

[21]  J. O'Doherty,et al.  Dissociating Valence of Outcome from Behavioral Control in Human Orbital and Ventral Prefrontal Cortices , 2003, The Journal of Neuroscience.

[22]  O. Hikosaka,et al.  Lateral habenula as a source of negative reward signals in dopamine neurons , 2007, Nature.

[23]  R. Rescorla,et al.  A theory of Pavlovian conditioning : Variations in the effectiveness of reinforcement and nonreinforcement , 1972 .

[24]  Ethan S. Bromberg-Martin,et al.  Multiple Timescales of Memory in Lateral Habenula and Dopamine Neurons , 2010, Neuron.

[25]  J D Cohen,et al.  A network model of catecholamine effects: gain, signal-to-noise ratio, and behavior. , 1990, Science.

[26]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[27]  Robert C. Wilson,et al.  An Approximately Bayesian Delta-Rule Model Explains the Dynamics of Belief Updating in a Changing Environment , 2010, The Journal of Neuroscience.

[28]  Massimo Silvetti,et al.  Value and Prediction Error in Medial Frontal Cortex: Integrating the Single-Unit and Systems Levels of Analysis , 2011, Front. Hum. Neurosci..

[29]  Timothy E. J. Behrens,et al.  Double dissociation of value computations in orbitofrontal and anterior cingulate neurons , 2011, Nature Neuroscience.

[30]  Ralph R. Miller,et al.  Assessment of the Rescorla-Wagner model. , 1995 .

[31]  C. Law,et al.  Reinforcement learning can account for associative and perceptual learning on a visual decision task , 2009, Nature Neuroscience.

[32]  Aaron C. Courville,et al.  The pigeon as particle filter , 2007, NIPS 2007.

[33]  D. Heeger,et al.  A Hierarchy of Temporal Receptive Windows in Human Cortex , 2008, The Journal of Neuroscience.

[34]  H. Seo,et al.  A reservoir of time constants for memory traces in cortical neurons , 2011, Nature Neuroscience.

[35]  M. Lee,et al.  A Bayesian analysis of human decision-making on bandit problems , 2009 .

[36]  H. Seo,et al.  Temporal Filtering of Reward Signals in the Dorsal Anterior Cingulate Cortex during a Mixed-Strategy Game , 2007, The Journal of Neuroscience.

[37]  Robert C. Wilson,et al.  Rational regulation of learning dynamics by pupil–linked arousal systems , 2012, Nature Neuroscience.

[38]  John M. Pearson,et al.  Posterior cingulate cortex: adapting behavior to a changing world , 2011, Trends in Cognitive Sciences.

[39]  E. Koechlin,et al.  Reasoning, Learning, and Creativity: Frontal Lobe Function and Human Decision-Making , 2012, PLoS biology.

[40]  Keiji Tanaka,et al.  Medial prefrontal cell activity signaling prediction errors of action values , 2007, Nature Neuroscience.

[41]  Daniel A. Braun,et al.  Risk-Sensitivity in Sensorimotor Control , 2011, Front. Hum. Neurosci..

[42]  J. Hartigan,et al.  Product Partition Models for Change Point Problems , 1992 .

[43]  P. Fearnhead,et al.  On‐line inference for multiple changepoint problems , 2007 .

[44]  Timothy E. J. Behrens,et al.  Learning the value of information in an uncertain world , 2007, Nature Neuroscience.