Reinforcement Learning or Active Inference?

This paper questions the need for reinforcement learning or control theory when optimising behaviour. We show that it is fairly simple to teach an agent complicated and adaptive behaviours using a free-energy formulation of perception. In this formulation, agents adjust their internal states and sampling of the environment to minimize their free-energy. Such agents learn causal structure in the environment and sample it in an adaptive and self-supervised fashion. This results in behavioural policies that reproduce those optimised by reinforcement learning and dynamic programming. Critically, we do not need to invoke the notion of reward, value or utility. We illustrate these points by solving a benchmark problem in dynamic programming; namely the mountain-car problem, using active perception or inference under the free-energy principle. The ensuing proof-of-concept may be important because the free-energy formulation furnishes a unified account of both action and perception and may speak to a reappraisal of the role of dopamine in the brain.

[1]  P. Cz. Handbuch der physiologischen Optik , 1896 .

[2]  E. M.,et al.  Statistical Mechanics , 2021, Manual for Theoretical Chemistry.

[3]  R Bellman,et al.  On the Theory of Dynamic Programming. , 1952, Proceedings of the National Academy of Sciences of the United States of America.

[4]  C. Caramanis What is ergodic theory , 1963 .

[5]  H B Barlow,et al.  PATTERN RECOGNITION AND THE RESPONSES OF SENSORY NEURONS * , 1969, Annals of the New York Academy of Sciences.

[6]  R. Rescorla,et al.  A theory of Pavlovian conditioning : Variations in the effectiveness of reinforcement and nonreinforcement , 1972 .

[7]  W. F. Prokasy,et al.  Classical conditioning II: Current research and theory. , 1972 .

[8]  H. Maturana,et al.  Autopoiesis and Cognition , 1980 .

[9]  A G Barto,et al.  Toward a modern theory of adaptive networks: expectation and prediction. , 1981, Psychological review.

[10]  Geoffrey E. Hinton,et al.  Parallel visual computation , 1983, Nature.

[11]  R Linsker,et al.  Perceptual neural organization: some approaches based on network models and information theory. , 1990, Annual review of neuroscience.

[12]  D Mumford,et al.  On the computational architecture of the neocortex. II. The role of cortico-cortical loops. , 1992, Biological cybernetics.

[13]  Karl J. Friston,et al.  Value-dependent selection in the brain: Simulation in a synthetic neural model , 1994, Neuroscience.

[14]  Geoffrey E. Hinton,et al.  The Helmholtz Machine , 1995, Neural Computation.

[15]  Michael I. Jordan,et al.  An internal model for sensorimotor integration. , 1995, Science.

[16]  D. Mackay Free energy minimisation algorithm for decoding and cryptanalysis , 1995 .

[17]  Richard S. Sutton,et al.  Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding , 1995, NIPS.

[18]  Peter Dayan,et al.  Bee foraging in uncertain environments using predictive hebbian learning , 1995, Nature.

[19]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[20]  Peter Dayan,et al.  A Neural Substrate of Prediction and Reward , 1997, Science.

[21]  L. Abbott,et al.  Synaptic Depression and Cortical Gain Control , 1997, Science.

[22]  Paul F. M. J. Verschure,et al.  A bottom up approach towards the acquisition and expression of sequential representations applied to a behaving real-world device: Distributed Adaptive Control III , 1998, Neural Networks.

[23]  H. Maturana,et al.  De máquinas y seres vivos , 1998 .

[24]  Rajesh P. N. Rao,et al.  Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects. , 1999 .

[25]  Vladimir Gontar,et al.  Entropy principle of extremality as a driving force in the discrete dynamics of complex and living systems , 2000 .

[26]  G. Arbuthnott,et al.  Computational models of the basal ganglia , 2000, Movement disorders : official journal of the Movement Disorder Society.

[27]  J. Horvitz Mesolimbocortical and nigrostriatal dopamine responses to salient non-reward events , 2000, Neuroscience.

[28]  L. Demetrius Thermodynamics and evolution. , 2000, Journal of theoretical biology.

[29]  George Eastman House,et al.  Sparse Bayesian Learning and the Relevance Vector Machine , 2001 .

[30]  Kenji Doya,et al.  Metalearning and neuromodulation , 2002, Neural Networks.

[31]  Peter Dayan,et al.  Dopamine: generalization and bonuses , 2002, Neural Networks.

[32]  Paul F. M. J. Verschure,et al.  Environmentally mediated synergy between perception and behaviour in mobile robots , 2003, Nature.

[33]  Colin Camerer Behavioural studies of strategic thinking in games , 2003, Trends in Cognitive Sciences.

[34]  Alan J. McKane Brownian Agents and Active Particles: Collective Dynamics in the Natural and Social Sciences , 2003 .

[35]  Tai Sing Lee,et al.  Hierarchical Bayesian inference in the visual cortex. , 2003, Journal of the Optical Society of America. A, Optics, image science, and vision.

[36]  Denis J. Evans,et al.  A non-equilibrium free energy theorem for deterministic systems , 2003 .

[37]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[38]  D. Knill,et al.  The Bayesian brain: the role of uncertainty in neural coding and computation , 2004, Trends in Neurosciences.

[39]  D. Mumford On the computational architecture of the neocortex , 2004, Biological Cybernetics.

[40]  Peter Dayan,et al.  Technical Note: Q-Learning , 2004, Machine Learning.

[41]  A. Yuille,et al.  Object perception as Bayesian inference. , 2004, Annual review of psychology.

[42]  Angela J. Yu,et al.  Uncertainty, Neuromodulation, and Attention , 2005, Neuron.

[43]  Florentin Wörgötter,et al.  Temporal Sequence Learning, Prediction, and Control: A Review of Different Models and Their Relation to Biological Mechanisms , 2005, Neural Computation.

[44]  Karl J. Friston,et al.  A theory of cortical responses , 2005, Philosophical Transactions of the Royal Society B: Biological Sciences.

[45]  P. Redgrave,et al.  The short-latency dopamine signal: a role in discovering novel actions? , 2006, Nature Reviews Neuroscience.

[46]  Astrid A Prinz,et al.  Insights from models of rhythmic motor systems , 2006, Current Opinion in Neurobiology.

[47]  Karl J. Friston,et al.  A free energy principle for the brain , 2006, Journal of Physiology-Paris.

[48]  Emanuel Todorov,et al.  Linearly-solvable Markov decision problems , 2006, NIPS.

[49]  K. Doya,et al.  The computational neurobiology of learning and reward , 2006, Current Opinion in Neurobiology.

[50]  Arne Traulsen,et al.  Coevolutionary dynamics in large, but finite populations. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[51]  Florentin Wörgötter,et al.  Correction: Adaptive, Fast Walking in a Biped Robot under Neuronal Control and Learning , 2007, PLoS Comput. Biol..

[52]  Karl J. Friston,et al.  Variational free energy and the Laplace approximation , 2007, NeuroImage.

[53]  H. Haken,et al.  Intentionality in non-equilibrium systems? The functional aspects of self-organized pattern formation , 2007 .

[54]  Florentin Wörgötter,et al.  Development of receptive fields in a closed-loop behavioural system , 2007, Neurocomputing.

[55]  Florentin Wörgötter,et al.  Adaptive, Fast Walking in a Biped Robot under Neuronal Control and Learning , 2007, PLoS Comput. Biol..

[56]  Karl J. Friston,et al.  Free-energy and the brain , 2007, Synthese.

[57]  Karl J. Friston Hierarchical Models in the Brain , 2008, PLoS Comput. Biol..

[58]  Sophie Denève,et al.  Bayesian Spiking Neurons I: Inference , 2008, Neural Computation.

[59]  Karl J. Friston,et al.  A Hierarchy of Time-Scales and the Brain , 2008, PLoS Comput. Biol..

[60]  J. Krakauer,et al.  A computational neuroanatomy for motor control , 2008, Experimental Brain Research.

[61]  Jiri Najemnik,et al.  Eye movement statistics in humans are consistent with an optimal search strategy. , 2008, Journal of vision.

[62]  Karl J. Friston,et al.  DEM: A variational treatment of dynamic systems , 2008, NeuroImage.

[63]  Konrad Paul Kording,et al.  Relevance of error: what drives motor adaptation? , 2009, Journal of neurophysiology.

[64]  A. Cooper,et al.  Predictive Reward Signal of Dopamine Neurons , 2011 .