A Mathematical Walkthrough and Discussion of the Free Energy Principle

The Free-Energy-Principle (FEP) is an influential and controversial theory which postulates a deep connection between the stochastic thermodynamics of self-organization and learning through variational inference. Specifically, it claims that any self-organizing system which can be statistically separated from its environment, and which maintains itself at a non-equilibrium steady state, can be construed as minimizing an information-theoretic functional – the variational free energy – and thus performing variational Bayesian inference to infer the hidden state of its environment. This principle has also been applied extensively in neuroscience, and is beginning to make inroads in machine learning by spurring the construction of novel algorithms by which action, perception, and learning can all be unified under a single objective. While its expansive and often grandiose claims have spurred significant debates in both philosophy and theoretical neuroscience, the mathematical depth and lack of accessible introductions and tutorials for the core claims of the theory have often made productive discussion challenging. Here, we aim to provide a mathematically detailed, yet intuitive walk-through of the formulation and central claims of the FEP while also providing a discussion of the assumptions necessary and potential limitations of the theory. Additionally, since the FEP is a living theory, subject to internal controversy, change, and revision, we also present a detailed appendix highlighting and condensing current perspectives as well as controversies about the nature, applicability, and the mathematical assumptions and formalisms underlying the FEP. ar X iv :2 10 8. 13 34 3v 2 [ cs .A I] 1 O ct 2 02 1 A PREPRINT 5TH OCTOBER, 2021

[1]  Karl J. Friston The history of the future of the Bayesian brain , 2012, NeuroImage.

[2]  J. Yedidia Message-Passing Algorithms for Inference and Optimization , 2011 .

[3]  R. Feynman Statistical Mechanics, A Set of Lectures , 1972 .

[4]  Anil K. Seth,et al.  Reinforcement Learning through Active Inference , 2020, ArXiv.

[5]  Raymond J. Dolan,et al.  The anatomy of choice: dopamine and decision-making , 2014, Philosophical Transactions of the Royal Society B: Biological Sciences.

[6]  Geoffrey E. Hinton,et al.  The Helmholtz Machine , 1995, Neural Computation.

[7]  Tianqi Chen,et al.  Stochastic Gradient Hamiltonian Monte Carlo , 2014, ICML.

[8]  Stephen J. Roberts,et al.  A tutorial on variational Bayesian inference , 2012, Artificial Intelligence Review.

[9]  Karl J. Friston,et al.  Active Inference in OpenAI Gym: A Paradigm for Computational Investigations Into Psychiatric Illness. , 2018, Biological psychiatry. Cognitive neuroscience and neuroimaging.

[10]  Karl J. Friston The free-energy principle: a unified brain theory? , 2010, Nature Reviews Neuroscience.

[11]  Rafal Bogacz,et al.  A tutorial on the free-energy framework for modelling perception and learning , 2017, Journal of mathematical psychology.

[12]  Karl J. Friston,et al.  Computational mechanisms of curiosity and goal-directed exploration , 2018, bioRxiv.

[13]  Raymond J. Dolan,et al.  Exploration, novelty, surprise, and free energy minimization , 2013, Front. Psychol..

[14]  Yian Ma,et al.  Potential function in dynamical systems and the relation with Lyapunov function , 2011, Proceedings of the 30th Chinese Control Conference.

[15]  Karl J. Friston,et al.  Reinforcement Learning or Active Inference? , 2009, PloS one.

[16]  Michael Betancourt,et al.  A Conceptual Introduction to Hamiltonian Monte Carlo , 2017, 1701.02434.

[17]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[18]  Adeel Razi,et al.  Biological Self-organisation and Markov blankets , 2017, bioRxiv.

[19]  P. Ao,et al.  SDE decomposition and A-type stochastic interpretation in nonequilibrium processes , 2017 .

[20]  Karl J. Friston,et al.  Sentience and the Origins of Consciousness: From Cartesian Duality to Markovian Monism , 2020, Entropy.

[21]  M. Betancourt Generalizing the No-U-Turn Sampler to Riemannian Manifolds , 2013, 1304.1920.

[22]  Michael W. Spratling A review of predictive coding algorithms , 2017, Brain and Cognition.

[23]  M. Esposito,et al.  Three faces of the second law. II. Fokker-Planck formulation. , 2010, Physical review. E, Statistical, nonlinear, and soft matter physics.

[24]  F. H. Adler Cybernetics, or Control and Communication in the Animal and the Machine. , 1949 .

[25]  Karl J. Friston,et al.  Action and behavior: a free-energy formulation , 2010, Biological Cybernetics.

[26]  Rajesh P. N. Rao,et al.  Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects. , 1999 .

[27]  Geoffrey E. Hinton,et al.  A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[28]  Karl J. Friston,et al.  A free energy principle for the brain , 2006, Journal of Physiology-Paris.

[29]  Anne Auger,et al.  Information-Geometric Optimization Algorithms: A Unifying Picture via Invariance Principles , 2011, J. Mach. Learn. Res..

[30]  Karl J. Friston The free-energy principle: a rough guide to the brain? , 2009, Trends in Cognitive Sciences.

[31]  M. Esposito,et al.  Three faces of the second law. I. Master equation formulation. , 2010, Physical review. E, Statistical, nonlinear, and soft matter physics.

[32]  Karl J. Friston A free energy principle for a particular physics , 2019, 1906.10184.

[33]  M. Opper,et al.  Advanced mean field methods: theory and practice , 2001 .

[34]  D. Kinderlehrer,et al.  THE VARIATIONAL FORMULATION OF THE FOKKER-PLANCK EQUATION , 1996 .

[35]  Karl J. Friston,et al.  A theory of cortical responses , 2005, Philosophical Transactions of the Royal Society B: Biological Sciences.

[36]  Ping Ao,et al.  Constructive Proof of Global Lyapunov Function as Potential Function , 2010, 1012.2721.

[37]  Karl J. Friston,et al.  Bayesian model reduction , 2018, 1805.07092.

[38]  Florian Nadel,et al.  Stochastic Processes And Filtering Theory , 2016 .

[39]  Grigorios A. Pavliotis,et al.  Bayesian mechanics for stationary processes , 2021, Proceedings of the Royal Society A.

[40]  Thomas Parr,et al.  The computational neurology of active vision , 2019 .

[41]  Karl J. Friston,et al.  Active Inference, Curiosity and Insight , 2017, Neural Computation.

[42]  R. Zwanzig Nonequilibrium statistical mechanics , 2001, Physics Subject Headings (PhySH).

[43]  Andrew Gelman,et al.  Handbook of Markov Chain Monte Carlo , 2011 .

[44]  Karl J. Friston,et al.  Uncertainty, epistemics and active inference , 2017, Journal of The Royal Society Interface.

[45]  Karl J. Friston Hierarchical Models in the Brain , 2008, PLoS Comput. Biol..

[46]  I. Prigogine,et al.  Theory of dissipative structures , 1973 .

[47]  Karl J. Friston,et al.  Human Neuroscience Hypothesis and Theory Article an Aberrant Precision Account of Autism , 2022 .

[48]  Gerald J. Sussman,et al.  Structure and interpretation of classical mechanics , 2001 .

[49]  Igor V. Ovchinnikov,et al.  Introduction to Supersymmetric Theory of Stochastics , 2015, Entropy.

[50]  M. Girolami,et al.  Riemann manifold Langevin and Hamiltonian Monte Carlo methods , 2011, Journal of the Royal Statistical Society: Series B (Statistical Methodology).

[51]  Karl J. Friston,et al.  Neural and phenotypic representation under the free-energy principle , 2020, Neuroscience & Biobehavioral Reviews.

[52]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[53]  Karl J. Friston,et al.  Neuronal message passing using Mean-field, Bethe, and Marginal approximations , 2019, Scientific Reports.

[54]  Karl J. Friston,et al.  Some Interesting Observations on the Free Energy Principle , 2020, Entropy.

[55]  John Geweke,et al.  Bayesian Model Comparison and Validation , 2007 .

[56]  Beren Millidge,et al.  Deep Active Inference as Variational Policy Gradients , 2019, Journal of Mathematical Psychology.

[57]  U. Seifert Stochastic thermodynamics, fluctuation theorems and molecular machines , 2012, Reports on progress in physics. Physical Society.

[58]  Karl J. Friston Learning and inference in the brain , 2003, Neural Networks.

[59]  Simon McGregor,et al.  The free energy principle for action and perception: A mathematical review , 2017, 1705.09156.

[60]  Tianqi Chen,et al.  A Complete Recipe for Stochastic Gradient MCMC , 2015, NIPS.

[61]  Karl J. Friston,et al.  Active inference on discrete state-spaces: A synthesis , 2020, Journal of mathematical psychology.

[62]  Tutut Herawan,et al.  Computational and mathematical methods in medicine. , 2006, Computational and mathematical methods in medicine.

[63]  R. Kanai,et al.  A technical critique of the free energy principle as presented in "Life as we know it" and related works , 2020, 2001.06408.

[64]  Karl J. Friston,et al.  Modules or Mean-Fields? , 2020, Entropy.

[65]  Karl J. Friston,et al.  Active inference and the anatomy of oculomotion , 2018, Neuropsychologia.

[66]  Shun-ichi Amari,et al.  Information geometry of the EM and em algorithms for neural networks , 1995, Neural Networks.

[67]  Karl J. Friston Life as we know it , 2013, Journal of The Royal Society Interface.

[68]  J. Troutman Variational Principles in Mechanics , 1983 .

[69]  Udo Seifert,et al.  Stochastic thermodynamics: principles and perspectives , 2007, 0710.1187.

[70]  David Mumford,et al.  On the computational architecture of the neocortex , 2004, Biological Cybernetics.

[71]  Kai Ueltzhöffer,et al.  Stochastic Chaos and Markov Blankets , 2021, Entropy.

[72]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[73]  S. Laughlin,et al.  Predictive coding: a fresh view of inhibition in the retina , 1982, Proceedings of the Royal Society of London. Series B. Biological Sciences.

[74]  Zoubin Ghahramani,et al.  Propagation Algorithms for Variational Bayesian Learning , 2000, NIPS.

[75]  Karl J. Friston,et al.  Markov blankets, information geometry and stochastic thermodynamics , 2019, Philosophical Transactions of the Royal Society A.

[76]  Karl J. Friston,et al.  Knowing one's place: a free-energy approach to pattern regulation , 2015, Journal of The Royal Society Interface.

[77]  W. Ashby,et al.  Every Good Regulator of a System Must Be a Model of That System , 1970 .

[78]  J. Hohwy The self-evidencing brain , 2016 .

[79]  Karl J. Friston,et al.  Active Inference: A Process Theory , 2017, Neural Computation.

[80]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[81]  Karl J. Friston,et al.  Predictive coding explains binocular rivalry: An epistemological review , 2008, Cognition.

[82]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[83]  T. Başar,et al.  A New Approach to Linear Filtering and Prediction Problems , 2001 .

[84]  M. Nour Surfing Uncertainty: Prediction, Action, and the Embodied Mind. , 2017, British Journal of Psychiatry.

[85]  Karl J. Friston,et al.  Active inference and agency: optimal control without cost functions , 2012, Biological Cybernetics.

[86]  Mel Andrews,et al.  The math is not the territory: navigating the free energy principle , 2021, Biology & Philosophy.

[87]  How particular is the physics of the free energy principle? , 2021, Physics of life reviews.

[88]  Karl J. Friston,et al.  Hallucinations both in and out of context: An active inference account , 2019, bioRxiv.

[89]  David M. Blei,et al.  Variational Inference: A Review for Statisticians , 2016, ArXiv.

[90]  Shu-Kun Lin,et al.  Modern Thermodynamics: From Heat Engines to Dissipative Structures , 1999, Entropy.

[91]  Karl J. Friston,et al.  Active inference and epistemic value , 2015, Cognitive neuroscience.

[92]  A. Caticha The basics of information geometry , 2014, 1412.5633.

[93]  Karl J. Friston,et al.  Deep temporal models and active inference , 2017, Neuroscience & Biobehavioral Reviews.

[94]  Is the Brain an Organ for Prediction Error Minimization? , 2020 .

[95]  Matthew J. Beal Variational algorithms for approximate Bayesian inference , 2003 .