Learning action-oriented models through active inference

Converging theories suggest that organisms learn and exploit probabilistic models of their environment. However, it remains unclear how such models can be learned in practice. The open-ended complexity of natural environments means that it is generally infeasible for organisms to model their environment comprehensively. Alternatively, action-oriented models attempt to encode a parsimonious representation of adaptive agent-environment interactions. One approach to learning action-oriented models is to learn online in the presence of goal-directed behaviours. This constrains an agent to behaviourally relevant trajectories, reducing the diversity of the data a model need account for. Unfortunately, this approach can cause models to prematurely converge to sub-optimal solutions, through a process we refer to as a bad-bootstrap. Here, we exploit the normative framework of active inference to show that efficient action-oriented models can be learned by balancing goal-oriented and epistemic (information-seeking) behaviours in a principled manner. We illustrate our approach using a simple agent-based model of bacterial chemotaxis. We first demonstrate that learning via goal-directed behaviour indeed constrains models to behaviorally relevant aspects of the environment, but that this approach is prone to sub-optimal convergence. We then demonstrate that epistemic behaviours facilitate the construction of accurate and comprehensive models, but that these models are not tailored to any specific behavioural niche and are therefore less efficient in their use of data. Finally, we show that active inference agents learn models that are parsimonious, tailored to action, and which avoid bad bootstraps and sub-optimal convergence. Critically, our results indicate that models learned through active inference can support adaptive behaviour in spite of, and indeed because of, their departure from veridical representations of the environment. Our approach provides a principled method for learning adaptive models from limited interactions with an environment, highlighting a route to sample efficient learning algorithms.

[1]  Jan Peters,et al.  A Survey on Policy Search for Robotics , 2013, Found. Trends Robotics.

[2]  Philipp Schwartenbeck,et al.  Computational mechanisms of curiosity and goal-directed exploration , 2019, eLife.

[3]  Adeel Razi,et al.  Bayesian model reduction and empirical Bayes for group (DCM) studies , 2016, NeuroImage.

[4]  Amir Mitchell,et al.  Cellular perception and misperception: Internal models for decision‐making shaped by evolutionary experience , 2016, BioEssays : news and reviews in molecular, cellular and developmental biology.

[5]  James J. Gibson,et al.  The Ecological Approach to Visual Perception: Classic Edition , 2014 .

[6]  Karl J. Friston Life as we know it , 2013, Journal of The Royal Society Interface.

[7]  Anil K. Seth,et al.  Being a Beast Machine: The Somatic Basis of Selfhood , 2018, Trends in Cognitive Sciences.

[8]  Raymond J. Dolan,et al.  The anatomy of choice: dopamine and decision-making , 2014, Philosophical Transactions of the Royal Society B: Biological Sciences.

[9]  Olaf Sporns,et al.  Mapping Information Flow in Sensorimotor Networks , 2006, PLoS Comput. Biol..

[10]  Christopher L. Buckley,et al.  Nonmodular Architectures of Cognitive Systems based on Active Inference , 2019, 2019 International Joint Conference on Neural Networks (IJCNN).

[11]  Alexandre Bernardino,et al.  A measure of good motor actions for active visual perception , 2011, 2011 IEEE International Conference on Development and Learning (ICDL).

[12]  Karl J. Friston,et al.  What is value—accumulated reward or evidence? , 2012, Front. Neurorobot..

[13]  Athanasios S. Polydoros,et al.  Survey of Model-Based Reinforcement Learning: Applications on Robotics , 2017, J. Intell. Robotic Syst..

[14]  Peter L. Freddolino,et al.  Beyond homeostasis: a predictive-dynamic framework for understanding cellular behavior. , 2012, Annual review of cell and developmental biology.

[15]  Angela Mendelovici,et al.  Reliable misrepresentation and tracking theories of mental representation , 2013 .

[16]  A. Clark Radical predictive processing , 2015 .

[17]  Christopher L. Buckley,et al.  An active inference implementation of phototaxis , 2017, ECAL.

[18]  D. Dennett,et al.  The evolution of misbelief , 2009, Behavioral and Brain Sciences.

[19]  Karl J. Friston,et al.  Active inference and epistemic value , 2015, Cognitive neuroscience.

[20]  Anil K. Seth,et al.  The cybernetic Bayesian brain: from interoceptive inference to sensorimotor contingencies , 2014 .

[21]  P. Dayan,et al.  Model-based and model-free Pavlovian reward learning: Revaluation, revision, and revelation , 2014, Cognitive, affective & behavioral neuroscience.

[22]  Karl J. Friston,et al.  Neuroscience and Biobehavioral Reviews , 2022 .

[23]  Paul F. M. J. Verschure,et al.  Environmentally mediated synergy between perception and behaviour in mobile robots , 2003, Nature.

[24]  Christopher L. Buckley,et al.  The dark room problem in predictive processing and active inference, a legacy of cognitivism? , 2019, The 2019 Conference on Artificial Life.

[25]  Y. Pilpel,et al.  Adaptive prediction of environmental changes by microorganisms , 2009, Nature.

[26]  Filip De Turck,et al.  VIME: Variational Information Maximizing Exploration , 2016, NIPS.

[27]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[28]  Xabier E. Barandiaran Autonomy and Enactivism: Towards a Theory of Sensorimotor Autonomous Agency , 2017 .

[29]  R. Gregory Perceptions as hypotheses. , 1980, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[30]  C. Rasmussen,et al.  Improving PILCO with Bayesian Neural Network Dynamics Models , 2016 .

[31]  Karl J. Friston,et al.  Perceptions as Hypotheses: Saccades as Experiments , 2012, Front. Psychology.

[32]  Pierre-Yves Oudeyer,et al.  Towards a neuroscience of active sampling and curiosity , 2018, Nature Reviews Neuroscience.

[33]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[34]  Pieter Abbeel,et al.  Benchmarking Model-Based Reinforcement Learning , 2019, ArXiv.

[35]  Geoffrey E. Hinton,et al.  Keeping the neural networks simple by minimizing the description length of the weights , 1993, COLT '93.

[36]  H. Berg,et al.  Chemotaxis in Escherichia coli analysed by Three-dimensional Tracking , 1972, Nature.

[37]  Christopher L. Buckley,et al.  Generative models as parsimonious descriptions of sensorimotor loops , 2019, Behavioral and Brain Sciences.

[38]  P. Dayan,et al.  Goals and Habits in the Brain , 2013, Neuron.

[39]  Tom Froese,et al.  Where There is Life There is Mind: In Support of a Strong Life-Mind Continuity Thesis , 2017, Entropy.

[40]  Daniel Williams,et al.  Predictive Processing and the Representation Wars , 2017, Minds and Machines.

[41]  Matthew J. Beal Variational algorithms for approximate Bayesian inference , 2003 .

[42]  Xabier E. Barandiaran,et al.  Modeling habits as self-sustaining patterns of sensorimotor behavior , 2014, Front. Hum. Neurosci..

[43]  Karl J. Friston,et al.  Computational mechanisms of curiosity and goal-directed exploration , 2018, bioRxiv.

[44]  Teodor Negru,et al.  SELF-ORGANIZATION, AUTOPOIESIS, FREE-ENERGY PRINCIPLE AND AUTONOMY , 2018 .

[45]  Ian Robertson,et al.  Enactivism and predictive processing: a non-representational view , 2018 .

[46]  Karl J. Friston,et al.  Free-energy and the brain , 2007, Synthese.

[47]  Sergey Levine,et al.  Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models , 2015, ArXiv.

[48]  W. Ashby,et al.  Every Good Regulator of a System Must Be a Model of That System , 1970 .

[49]  Wanja Wiese,et al.  Action Is Enabled by Systematic Misrepresentations , 2017 .

[50]  Felix D. Schönbrodt,et al.  When misrepresentations are successful , 2015 .

[51]  Giovanni Pezzulo,et al.  Model-Based Approaches to Active Perception and Control , 2017, Entropy.

[52]  Karl J. Friston,et al.  Generalised free energy and active inference: can the future cause the past? , 2018 .

[53]  Ari Weinstein,et al.  Model-based hierarchical reinforcement learning and human action control , 2014, Philosophical Transactions of the Royal Society B: Biological Sciences.

[54]  Chris Thornton,et al.  Gauging the value of good data: Informational embodiment quantification , 2010, Adapt. Behav..

[55]  Karl J. Friston,et al.  A tale of two densities: active inference is enactive inference , 2019, Adapt. Behav..

[56]  Karl J. Friston,et al.  Reinforcement Learning or Active Inference? , 2009, PloS one.

[57]  N. Daw,et al.  The ubiquity of model-based reinforcement learning , 2012, Current Opinion in Neurobiology.

[58]  Karl J. Friston,et al.  The Active Inference Approach to Ecological Perception: General Information Dynamics for Natural and Artificial Embodied Cognition , 2018, Front. Robot. AI.

[59]  Yi Sun,et al.  Planning to Be Surprised: Optimal Bayesian Exploration in Dynamic Environments , 2011, AGI.

[60]  Jon M. Kleinberg,et al.  Incentivizing exploration , 2014, EC.

[61]  Richard S. Sutton,et al.  Model-Based Reinforcement Learning with an Approximate, Learned Model , 1996 .

[62]  Michael Kühl,et al.  Bacteria are not too small for spatial sensing of chemical gradients: An experimental evidence , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[63]  Simon McGregor,et al.  The free energy principle for action and perception: A mathematical review , 2017, 1705.09156.

[64]  D. Knill,et al.  The Bayesian brain: the role of uncertainty in neural coding and computation , 2004, Trends in Neurosciences.

[65]  Karl J. Friston The free-energy principle: a unified brain theory? , 2010, Nature Reviews Neuroscience.

[66]  Jürgen Schmidhuber,et al.  World Models , 2018, ArXiv.

[67]  A. Borst Seeing smells: imaging olfactory learning in bees , 1999, Nature Neuroscience.

[68]  Alexei A. Efros,et al.  Large-Scale Study of Curiosity-Driven Learning , 2018, ICLR.

[69]  Christopher G. Atkeson,et al.  A comparison of direct and model-based reinforcement learning , 1997, Proceedings of International Conference on Robotics and Automation.

[70]  Karl J. Friston,et al.  Active Inference, Curiosity and Insight , 2017, Neural Computation.

[71]  Scott Cheng-Hsin Yang,et al.  Theoretical perspectives on active sensing , 2016, Current Opinion in Behavioral Sciences.

[72]  Karl J. Friston,et al.  Deep temporal models and active inference , 2017, Neuroscience & Biobehavioral Reviews.

[73]  Pierre Baldi,et al.  Of bits and wows: A Bayesian theory of surprise with applications to attention , 2010, Neural Networks.

[74]  Karl J. Friston,et al.  The Markov blankets of life: autonomy, active inference and the free energy principle , 2018, Journal of The Royal Society Interface.

[75]  Keyan Zahedi,et al.  A Theory of Cheap Control in Embodied Systems , 2014, PLoS Comput. Biol..

[76]  M. Lungarella,et al.  Information Self-Structuring: Key Principle for Learning and Development , 2005, Proceedings. The 4nd International Conference on Development and Learning, 2005..