A Reward-Maximizing Spiking Neuron as a Bounded Rational Decision Maker

Rate distortion theory describes how to communicate relevant information most efficiently over a channel with limited capacity. One of the many applications of rate distortion theory is bounded rational decision making, where decision makers are modeled as information channels that transform sensory input into motor output under the constraint that their channel capacity is limited. Such a bounded rational decision maker can be thought to optimize an objective function that trades off the decision maker’s utility or cumulative reward against the information processing cost measured by the mutual information between sensory input and motor output. In this study, we interpret a spiking neuron as a bounded rational decision maker that aims to maximize its expected reward under the computational constraint that the mutual information between the neuron’s input and output is upper bounded. This abstract computational constraint translates into a penalization of the deviation between the neuron’s instantaneous and average firing behavior. We derive a synaptic weight update rule for such a rate distortion optimizing neuron and show in simulations that the neuron efficiently extracts reward-relevant information from the input by trading off its synaptic strengths against the collected reward.

[1]  H. Simon,et al.  Rational choice and the structure of the environment. , 1956, Psychological review.

[2]  Naftali Tishby,et al.  The information bottleneck method , 2000, ArXiv.

[3]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[4]  E. Oja Simplified neuron model as a principal component analyzer , 1982, Journal of mathematical biology.

[5]  P. Dayan,et al.  CONNECTIONS BETWEEN C N OMPUTATIONAL AND NEUROBIOLOGICAL PERSPECTIVES ON DECISION MAKING Decision theory, reinforcement learning, and the brain , 2008 .

[6]  S. Walker Invited comment on the paper "Slice Sampling" by Radford Neal , 2003 .

[7]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[8]  Lucas C. Parra,et al.  On the Maximization of Information Flow Between Spiking Neurons , 2009, Neural Computation.

[9]  C. Sims Implications of rational inattention , 2003 .

[10]  Naftali Tishby,et al.  The Information Bottleneck Revisited or How to Choose a Good Distortion Measure , 2007, 2007 IEEE International Symposium on Information Theory.

[11]  C. Sims,et al.  Rational Inattention: A Research Agenda , 2005, SSRN Electronic Journal.

[12]  S. Amari Information geometry in optimization, machine learning and statistical inference , 2010 .

[13]  L. Abbott,et al.  Synaptic plasticity: taming the beast , 2000, Nature Neuroscience.

[14]  R. McKelvey,et al.  Quantal Response Equilibria for Normal Form Games , 1995 .

[15]  P. Dayan,et al.  Reinforcement learning: The Good, The Bad and The Ugly , 2008, Current Opinion in Neurobiology.

[16]  P. Dayan,et al.  A framework for mesencephalic dopamine systems based on predictive Hebbian learning , 1996, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[17]  Naftali Tishby,et al.  Trading Value and Information in MDPs , 2012 .

[18]  Jordi Grau-Moya,et al.  Bounded Rationality, Abstraction, and Hierarchical Decision-Making: An Information-Theoretic Optimality Principle , 2015, Front. Robot. AI.

[19]  Wulfram Gerstner,et al.  Code-specific policy gradient rules for spiking neurons , 2009, NIPS.

[20]  H. Seo,et al.  Neural basis of reinforcement learning and decision making. , 2012, Annual review of neuroscience.

[21]  Aaron D. Wyner,et al.  Coding Theorems for a Discrete Source With a Fidelity CriterionInstitute of Radio Engineers, International Convention Record, vol. 7, 1959. , 1993 .

[22]  Daniel A. Braun,et al.  Information-Theoretic Bounded Rationality and ε-Optimality , 2014, Entropy.

[23]  J. O'Doherty,et al.  Beyond simple reinforcement learning: the computational neurobiology of reward‐learning and valuation , 2012, The European journal of neuroscience.

[24]  Richard E. Blahut,et al.  Computation of channel capacity and rate-distortion functions , 1972, IEEE Trans. Inf. Theory.

[25]  Emanuel Todorov,et al.  Efficient computation of optimal actions , 2009, Proceedings of the National Academy of Sciences.

[26]  N. Roy,et al.  On Stochastic Optimal Control and Reinforcement Learning by Approximate Inference , 2013 .

[27]  Jan Peters,et al.  Autonomous reinforcement learning with hierarchical REPS , 2013, The 2013 International Joint Conference on Neural Networks (IJCNN).

[28]  J. O'Doherty,et al.  Model‐Based fMRI and Its Application to Reward Learning and Decision Making , 2007, Annals of the New York Academy of Sciences.

[29]  S. Nelson,et al.  Homeostatic plasticity in the developing nervous system , 2004, Nature Reviews Neuroscience.

[30]  Thomas Hofmann,et al.  Information Bottleneck Optimization and Independent Component Extraction with Spiking Neurons , 2007 .

[31]  Jan Peters,et al.  Hierarchical Relative Entropy Policy Search , 2014, AISTATS.

[32]  Radford M. Neal Slice Sampling , 2003, The Annals of Statistics.

[33]  E. Izhikevich Solving the distal reward problem through linkage of STDP and dopamine signaling , 2007, BMC Neuroscience.

[34]  Robert A. Legenstein,et al.  Spiking Neurons Can Learn to Solve Information Bottleneck Problems and Extract Independent Components , 2009, Neural Computation.

[35]  Imre Csiszár,et al.  On the computation of rate-distortion functions (Corresp.) , 1974, IEEE Trans. Inf. Theory.

[36]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[37]  Matthijs A. A. van der Meer,et al.  Integrating hippocampus and striatum in decision-making , 2007, Current Opinion in Neurobiology.

[38]  J. Changeux,et al.  Selective stabilisation of developing synapses as a mechanism for the specification of neuronal networks , 1976, Nature.

[39]  Daniel A. Braun,et al.  Monte Carlo methods for exact & efficient solution of the generalized optimality equations , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[40]  W. Gerstner,et al.  Generalized Bienenstock-Cooper-Munro rule for spiking neurons that maximizes information transmission. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[41]  Susanne Still,et al.  Information-theoretic approach to interactive learning , 2007, 0709.1948.

[42]  David H. Wolpert,et al.  Information Theory - The Bridge Connecting Bounded Rational Game Theory and Statistical Physics , 2004, ArXiv.

[43]  K. Doya Modulators of decision making , 2008, Nature Neuroscience.

[44]  C. Sims Rational Inattention: Beyond the Linear-Quadratic Case , 2006 .

[45]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[46]  Lars-Göran Mattsson,et al.  Probabilistic choice and procedurally bounded rationality , 2002, Games Econ. Behav..

[47]  Robert A. Legenstein,et al.  A Learning Theory for Reward-Modulated Spike-Timing-Dependent Plasticity with Application to Biofeedback , 2008, PLoS Comput. Biol..

[48]  Timothy E. J. Behrens,et al.  Choice, uncertainty and value in prefrontal and cingulate cortex , 2008, Nature Neuroscience.

[49]  Daniel A. Braun,et al.  A conversion between utility and information , 2009, AGI 2010.

[50]  Xiaohui Xie,et al.  Learning in neural networks by reinforcement of irregular spiking. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[51]  Daniel A. Braun,et al.  Information, Utility and Bounded Rationality , 2011, AGI.

[52]  Emanuel Todorov,et al.  Linearly-solvable Markov decision problems , 2006, NIPS.

[53]  E. Bienenstock,et al.  Theory for the development of neuron selectivity: orientation specificity and binocular interaction in visual cortex , 1982, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[54]  Frank Sehnke,et al.  Policy Gradients with Parameter-Based Exploration for Control , 2008, ICANN.

[55]  J. Hayashi [Sampling methods]. , 1982, Josanpu zasshi = The Japanese journal for midwife.

[56]  Tatiana V. Guy,et al.  Decision Making with Imperfect Decision Makers , 2011 .

[57]  Daniel A. Braun,et al.  Thermodynamics as a theory of decision-making with information-processing costs , 2012, Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[58]  Karl J. Friston The free-energy principle: a unified brain theory? , 2010, Nature Reviews Neuroscience.

[59]  Ralph Linsker,et al.  How to Generate Ordered Maps by Maximizing the Mutual Information between Input and Output Signals , 1989, Neural Computation.

[60]  A. Hobson A new theorem of information theory , 1969 .

[61]  Yasemin Altun,et al.  Relative Entropy Policy Search , 2010 .

[62]  J. Jensen Sur les fonctions convexes et les inégalités entre les valeurs moyennes , 1906 .

[63]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[64]  Daniel A. Braun,et al.  Free Energy and the Generalized Optimality Equations for Sequential Decision Making , 2012, EWRL 2012.

[65]  Stefan Schaal,et al.  2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .

[66]  R. Selten,et al.  Bounded rationality: The adaptive toolbox , 2000 .

[67]  Vicenç Gómez,et al.  Optimal control as a graphical model inference problem , 2009, Machine Learning.

[68]  Daniel A. Braun,et al.  Abstraction in decision-makers with limited information processing capabilities , 2013, NIPS 2013.

[69]  Gal Chechik,et al.  Spike-Timing-Dependent Plasticity and Relevant Mutual Information Maximization , 2003, Neural Computation.

[70]  Razvan V. Florian,et al.  Reinforcement Learning Through Modulation of Spike-Timing-Dependent Synaptic Plasticity , 2007, Neural Computation.

[71]  P. Dayan,et al.  Decision theory, reinforcement learning, and the brain , 2008, Cognitive, affective & behavioral neuroscience.

[72]  J. Changeux,et al.  A theory of the epigenesis of neuronal networks by selective stabilization of synapses. , 1973, Proceedings of the National Academy of Sciences of the United States of America.

[73]  Wolfgang Maass,et al.  A Spiking Neuron as Information Bottleneck , 2010, Neural Computation.

[74]  Kee-Eung Kim,et al.  Information-Theoretic Bounded Rationality , 2015, ArXiv.

[75]  K. Doya,et al.  The computational neurobiology of learning and reward , 2006, Current Opinion in Neurobiology.

[76]  Ron Meir,et al.  Reinforcement Learning, Spike-Time-Dependent Plasticity, and the BCM Rule , 2007, Neural Computation.

[77]  Jean-Pascal Pfister,et al.  Optimal Spike-Timing-Dependent Plasticity for Precise Action Potential Firing in Supervised Learning , 2005, Neural Computation.

[78]  Susanne Still,et al.  LOSSY IS LAZY , 2015 .

[79]  Christopher A. Sims,et al.  RATIONAL INATTENTION AND MONETARY ECONOMICS , 2010 .

[80]  Amir Hussain,et al.  Perception-Action Cycle , 2011 .

[81]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[82]  Stefan Schaal,et al.  Path integral control and bounded rationality , 2011, 2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).

[83]  Daniel A. Braun,et al.  Generalized Thompson sampling for sequential decision-making and causal inference , 2013, Complex Adapt. Syst. Model..

[84]  Suguru Arimoto,et al.  An algorithm for computing the capacity of arbitrary discrete memoryless channels , 1972, IEEE Trans. Inf. Theory.

[85]  Terrence J. Sejnowski,et al.  An Information-Maximization Approach to Blind Separation and Blind Deconvolution , 1995, Neural Computation.

[86]  Frank Sehnke,et al.  Parameter-exploring policy gradients , 2010, Neural Networks.

[87]  Daniel Polani,et al.  Information Theory of Decisions and Actions , 2011 .