论文信息 - Deep Active Inference as Variational Policy Gradients

Deep Active Inference as Variational Policy Gradients

Active Inference is a theory of action arising from neuroscience which casts action and planning as a bayesian inference problem to be solved by minimizing a single quantity - the variational free energy. Active Inference promises a unifying account of action and perception coupled with a biologically plausible process theory. Despite these potential advantages, current implementations of Active Inference can only handle small, discrete policy and state-spaces and typically require the environmental dynamics to be known. In this paper we propose a novel deep Active Inference algorithm which approximates key densities using deep neural networks as flexible function approximators, which enables Active Inference to scale to significantly larger and more complex tasks. We demonstrate our approach on a suite of OpenAIGym benchmark tasks and obtain performance comparable with common reinforcement learning baselines. Moreover, our algorithm shows similarities with maximum entropy reinforcement learning and the policy gradients algorithm, which reveals interesting connections between the Active Inference framework and reinforcement learning.

Beren Millidge | Beren Millidge

[1] Karl J. Friston,et al. Uncertainty, epistemics and active inference , 2017, Journal of The Royal Society Interface.

[2] Marc Toussaint,et al. Approximate Inference and Stochastic Optimal Control , 2010, ArXiv.

[3] Anind K. Dey,et al. Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[4] Karl J. Friston,et al. Attention, Uncertainty, and Free-Energy , 2010, Front. Hum. Neurosci..

[5] A. Borst. Seeing smells: imaging olfactory learning in bees , 1999, Nature Neuroscience.

[6] Shimon Whiteson,et al. Expected Policy Gradients , 2017, AAAI.

[7] Philipp Schwartenbeck,et al. Computational mechanisms of curiosity and goal-directed exploration , 2019, eLife.

[8] Shakir Mohamed,et al. Variational Information Maximisation for Intrinsically Motivated Reinforcement Learning , 2015, NIPS.

[9] Raymond J. Dolan,et al. The anatomy of choice: dopamine and decision-making , 2014, Philosophical Transactions of the Royal Society B: Biological Sciences.

[10] Karl J. Friston. Learning and inference in the brain , 2003, Neural Networks.

[11] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[12] Karl J. Friston,et al. The graphical brain: Belief propagation and active inference , 2017, Network Neuroscience.

[13] Raymond J. Dolan,et al. The anatomy of choice: active inference and agency , 2013, Front. Hum. Neurosci..

[14] Rafal Bogacz,et al. An Approximation of the Error Backpropagation Algorithm in a Predictive Coding Network with Local Hebbian Synaptic Plasticity , 2017, Neural Computation.

[15] Marcelino Lázaro,et al. A new EM-based training algorithm for RBF networks , 2003, Neural Networks.

[16] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[17] Karl J. Friston. What Is Optimal about Motor Control? , 2011, Neuron.

[18] L. F. Barrett,et al. An active inference theory of allostasis and interoception in depression , 2016, Philosophical Transactions of the Royal Society B: Biological Sciences.

[19] Bert de Vries,et al. Simulating Active Inference Processes by Message Passing , 2019, Frontiers Robotics AI.

[20] C. Moulin,et al. An active inference and epistemic value view of metacognition , 2015, Cognitive neuroscience.

[21] Karl J. Friston,et al. Active Inference, Curiosity and Insight , 2017, Neural Computation.

[22] Karl J. Friston,et al. Deep temporal models and active inference , 2017, Neuroscience & Biobehavioral Reviews.

[23] Filip De Turck,et al. VIME: Variational Information Maximizing Exploration , 2016, NIPS.

[24] Christoph Salge,et al. Expanding the Active Inference Landscape: More Intrinsic Motivations in the Perception-Action Loop , 2018, Front. Neurorobot..

[25] Karl J. Friston. A free energy principle for a particular physics , 2019, 1906.10184.

[26] David M. Blei,et al. Variational Inference: A Review for Statisticians , 2016, ArXiv.

[27] Karl J. Friston,et al. Active inference and agency: optimal control without cost functions , 2012, Biological Cybernetics.

[28] Frank L. Lewis,et al. Off-Policy Actor-Critic Structure for Optimal Control of Unknown Systems With Disturbances , 2016, IEEE Transactions on Cybernetics.

[29] James R. Glass,et al. Bidirectional Backpropagation: Towards Biologically Plausible Error Signal Transmission in Neural Networks , 2017, ArXiv.

[30] Tim Verbelen,et al. Bayesian policy selection using active inference , 2019, ICLR 2019.

[31] Chong Wang,et al. Stochastic variational inference , 2012, J. Mach. Learn. Res..

[32] Martha White,et al. Linear Off-Policy Actor-Critic , 2012, ICML.

[33] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[34] Karl J. Friston. The history of the future of the Bayesian brain , 2012, NeuroImage.

[35] Samuel J. Gershman,et al. Uncertainty and Exploration , 2018, bioRxiv.

[36] Karl J. Friston,et al. Active Inference in OpenAI Gym: A Paradigm for Computational Investigations Into Psychiatric Illness. , 2018, Biological psychiatry. Cognitive neuroscience and neuroimaging.

[37] Karl J. Friston,et al. A theory of cortical responses , 2005, Philosophical Transactions of the Royal Society B: Biological Sciences.

[38] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .

[39] Karl J. Friston,et al. Smooth Pursuit and Visual Occlusion: Active Inference and Oculomotor Control in Schizophrenia , 2012, PloS one.

[40] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.

[41] Karl J. Friston,et al. The Anatomy of Inference: Generative Models and Brain Structure , 2018, Front. Comput. Neurosci..

[42] Geoffrey E. Hinton,et al. Deep Learning , 2015, Nature.

[43] S. Gershman. Deconstructing the human algorithms for exploration , 2018, Cognition.

[44] A. Clark. Whatever next? Predictive brains, situated agents, and the future of cognitive science. , 2013, The Behavioral and brain sciences.

[45] Karl J. Friston. The free-energy principle: a rough guide to the brain? , 2009, Trends in Cognitive Sciences.

[46] Simon McGregor,et al. The free energy principle for action and perception: A mathematical review , 2017, 1705.09156.

[47] Karl J. Friston,et al. Neuroscience and Biobehavioral Reviews , 2022 .

[48] Tuomas Haarnoja,et al. Acquiring Diverse Robot Skills via Maximum Entropy Deep Reinforcement Learning , 2018 .

[49] D. Knill,et al. The Bayesian brain: the role of uncertainty in neural coding and computation , 2004, Trends in Neurosciences.

[50] Karl J. Friston. The free-energy principle: a unified brain theory? , 2010, Nature Reviews Neuroscience.

[51] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[52] Andy Clark,et al. Dreaming the Whole Cat: Generative Models, Predictive Processing, and the Enactivist Conception of Perceptual Experience , 2012 .

[53] Rajesh P. N. Rao,et al. Bayesian brain : probabilistic approaches to neural coding , 2006 .

[54] M. Littman,et al. Mean Actor Critic , 2017, ArXiv.

[55] Karl J. Friston,et al. Predictions not commands: active inference in the motor system , 2012, Brain Structure and Function.

[56] Karl J. Friston,et al. Cerebral hierarchies: predictive processing, precision and the pulvinar , 2015, Philosophical Transactions of the Royal Society B: Biological Sciences.

[57] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.

[58] Karl J. Friston,et al. Active Inference, epistemic value, and vicarious trial and error , 2016, Learning & memory.

[59] Karl J. Friston,et al. Free-energy and the brain , 2007, Synthese.

[60] Sergey Levine,et al. Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models , 2015, ArXiv.

[61] Karl J. Friston,et al. Reinforcement Learning or Active Inference? , 2009, PloS one.

[62] Karl J. Friston,et al. A free energy principle for the brain , 2006, Journal of Physiology-Paris.

[63] Carl Doersch,et al. Tutorial on Variational Autoencoders , 2016, ArXiv.

[64] Karl J. Friston,et al. Active inference and the anatomy of oculomotion , 2018, Neuropsychologia.

[65] Claudio Gentile,et al. Boltzmann Exploration Done Right , 2017, NIPS.

[66] P. Dayan,et al. Cortical substrates for exploratory decisions in humans , 2006, Nature.

[67] N. Roy,et al. On Stochastic Optimal Control and Reinforcement Learning by Approximate Inference , 2013 .

[68] Karl J. Friston,et al. An active inference model of concept learning , 2019 .

[69] Karl J. Friston,et al. Generalised free energy and active inference: can the future cause the past? , 2018 .

[70] Yoshua Bengio,et al. Towards a Biologically Plausible Backprop , 2016, ArXiv.

[71] Geoffrey J. McLachlan,et al. Using the EM algorithm to train neural networks: misconceptions and a new algorithm for multiclass classification , 2004, IEEE Transactions on Neural Networks.

[72] Raymond J. Dolan,et al. Active Inference, Evidence Accumulation, and the Urn Task , 2015, Neural Computation.

[73] Jürgen Schmidhuber,et al. Recurrent World Models Facilitate Policy Evolution , 2018, NeurIPS.

[74] Pierre-Yves Oudeyer,et al. What is Intrinsic Motivation? A Typology of Computational Approaches , 2007, Frontiers Neurorobotics.

[75] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.

[76] Andrew Chi-Sing Leung,et al. On the Kalman filtering method in neural network training and pruning , 1999, IEEE Trans. Neural Networks.

[77] Chuanyi Ji,et al. Fast training of recurrent networks based on the EM algorithm , 1998, IEEE Trans. Neural Networks.

[78] A. Clark. Surfing Uncertainty: Prediction, Action, and the Embodied Mind , 2015 .

[79] Joel Z. Leibo,et al. How Important Is Weight Symmetry in Backpropagation? , 2015, AAAI.

[80] James C. R. Whittington,et al. Theories of Error Back-Propagation in the Brain , 2019, Trends in Cognitive Sciences.

[81] Kai Ueltzhöffer,et al. Deep active inference , 2017, Biological Cybernetics.

[82] Karl J. Friston,et al. Scene Construction, Visual Foraging, and Active Inference , 2016, Front. Comput. Neurosci..

[83] Karl J. Friston,et al. Impulsivity and Active Inference , 2019, Journal of Cognitive Neuroscience.

[84] Marco Gori,et al. Backpropagation and Biological Plausibility , 2018, ArXiv.

[85] Matthew Fellows,et al. VIREL: A Variational Inference Framework for Reinforcement Learning , 2018, NeurIPS.

[86] Wojciech Zaremba,et al. OpenAI Gym , 2016, ArXiv.

[87] Rafal Bogacz,et al. A tutorial on the free-energy framework for modelling perception and learning , 2017, Journal of mathematical psychology.

[88] Sergey Levine,et al. Reinforcement Learning and Control as Probabilistic Inference: Tutorial and Review , 2018, ArXiv.

[89] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[90] Karl J. Friston,et al. Active Inference, Attention, and Motor Preparation , 2011, Front. Psychology.

[91] Karl J. Friston,et al. Active inference and epistemic value , 2015, Cognitive neuroscience.

[92] Karl J. Friston,et al. Free Energy, Value, and Attractors , 2011, Comput. Math. Methods Medicine.

[93] Carl E. Rasmussen,et al. PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[94] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[95] Sergey Levine,et al. Reinforcement Learning with Deep Energy-Based Policies , 2017, ICML.

[96] M. Botvinick,et al. Planning as inference , 2012, Trends in Cognitive Sciences.

[97] Karl J. Friston. A Free Energy Principle for Biological Systems , 2012, Entropy.

[98] Joel Z. Leibo,et al. Unsupervised Predictive Memory in a Goal-Directed Agent , 2018, ArXiv.

[99] Filip De Turck,et al. Curiosity-driven Exploration in Deep Reinforcement Learning via Bayesian Neural Networks , 2016, ArXiv.

[100] Herke van Hoof,et al. Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.

[101] Karl J. Friston,et al. Neuronal message passing using Mean-field, Bethe, and Marginal approximations , 2019, Scientific Reports.