Action and Perception as Divergence Minimization

We introduce a unified objective for action and perception of intelligent agents. Extending representation learning and control, we minimize the joint divergence between the combined system of agent and environment and a target distribution. Intuitively, such agents use perception to align their beliefs with the world, and use actions to align the world with their beliefs. Minimizing the joint divergence to an expressive target maximizes the mutual information between the agent's representations and inputs, thus inferring representations that are informative of past inputs and exploring future inputs that are informative of the representations. This lets us explain intrinsic objectives, such as representation learning, information gain, empowerment, and skill discovery from minimal assumptions. Moreover, interpreting the target distribution as a latent variable model suggests powerful world models as a path toward highly adaptive agents that seek large niches in their environments, rendering task rewards optional. The framework provides a common language for comparing a wide range of objectives, advances the understanding of latent variables for decision making, and offers a recipe for designing novel objectives. We recommend deriving future agent objectives the joint divergence to facilitate comparison, to point out the agent's target distribution, and to identify the intrinsic objective terms needed to reach that distribution.

[1]  H. Jeffreys Some Tests of Significance, Treated by the Theory of Probability , 1935, Mathematical Proceedings of the Cambridge Philosophical Society.

[2]  E. Schrödinger What is life? : the physical aspect of the living cell , 1944 .

[3]  J. Neumann,et al.  Theory of games and economic behavior , 1945, 100 Years of Math Milestones.

[4]  E. Rowland Theory of Games and Economic Behavior , 1946, Nature.

[5]  W. Heitler The Principles of Quantum Mechanics , 1947, Nature.

[6]  A. Wald An Essentially Complete Class of Admissible Decision Functions , 1947 .

[7]  F. H. Adler Cybernetics, or Control and Communication in the Animal and the Machine. , 1949 .

[8]  David A. Huffman,et al.  A method for the construction of minimum-redundancy codes , 1952, Proceedings of the IRE.

[9]  D. Lindley On a Measure of the Information Provided by an Experiment , 1956 .

[10]  E. Jaynes Information Theory and Statistical Mechanics , 1957 .

[11]  Viktor Mikhaĭlovich Glushkov,et al.  An Introduction to Cybernetics , 1957, The Mathematical Gazette.

[12]  Shun-ichi Amari,et al.  A Theory of Adaptive Pattern Classifiers , 1967, IEEE Trans. Electron. Comput..

[13]  S. Lang Complex Analysis , 1977 .

[14]  R. Gregory Perceptions as hypotheses. , 1980, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[15]  L. Brown A Complete Class Theorem for Statistical Problems with Finite Sample Spaces , 1981 .

[16]  Hermann Haken,et al.  The Science of Structure: Synergetics , 1984 .

[17]  Lawrence D. Jackel,et al.  Large Automatic Learning, Rule Extraction, and Generalization , 1987, Complex Syst..

[18]  Carsten Peterson,et al.  A Mean Field Theory Learning Algorithm for Neural Networks , 1987, Complex Syst..

[19]  Raja Chatila,et al.  Stochastic multisensory data fusion for mobile robot location and environment modeling , 1989 .

[20]  Jing Peng,et al.  Function Optimization using Connectionist Reinforcement Learning Algorithms , 1991 .

[21]  Richard S. Sutton,et al.  Dyna, an integrated architecture for learning, planning, and reacting , 1990, SGAR.

[22]  Jürgen Schmidhuber,et al.  Curious model-building control systems , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.

[23]  David J. C. MacKay,et al.  Information-Based Objective Functions for Active Data Selection , 1992, Neural Computation.

[24]  James O. Berger,et al.  Ockham's Razor and Bayesian Analysis , 1992 .

[25]  J. Pratt RISK AVERSION IN THE SMALL AND IN THE LARGE11This research was supported by the National Science Foundation (grant NSF-G24035). Reproduction in whole or in part is permitted for any purpose of the United States Government. , 1964 .

[26]  David J. C. MacKay,et al.  A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[27]  Geoffrey E. Hinton,et al.  Keeping the neural networks simple by minimizing the description length of the weights , 1993, COLT '93.

[28]  Michael I. Jordan,et al.  Learning from Incomplete Data , 1994 .

[29]  Geoffrey E. Hinton,et al.  The Helmholtz Machine , 1995, Neural Computation.

[30]  J. Pearl Causal diagrams for empirical research , 1995 .

[31]  Daniel Hernández-Hernández,et al.  Risk Sensitive Markov Decision Processes , 1997 .

[32]  Huaiyu Zhu On Information and Sufficiency , 1997 .

[33]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[34]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[35]  Manuela Veloso,et al.  An Analysis of Stochastic Game Theory for Multiagent Reinforcement Learning , 2000 .

[36]  T. Başar,et al.  A New Approach to Linear Filtering and Prediction Problems , 2001 .

[37]  David Barber,et al.  The IM algorithm: a variational approach to Information Maximization , 2003, NIPS 2003.

[38]  Imre Csiszár,et al.  Information projections revisited , 2000, IEEE Trans. Inf. Theory.

[39]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[40]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[41]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[42]  Chrystopher L. Nehaniv,et al.  Empowerment: a universal agent-centric measure of control , 2005, 2005 IEEE Congress on Evolutionary Computation.

[43]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[44]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[45]  Fu Jie Huang,et al.  A Tutorial on Energy-Based Learning , 2006 .

[46]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[47]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[48]  Pierre-Yves Oudeyer,et al.  Intrinsic Motivation Systems for Autonomous Mental Development , 2007, IEEE Transactions on Evolutionary Computation.

[49]  Emanuel Todorov,et al.  General duality between optimal control and estimation , 2008, 2008 47th IEEE Conference on Decision and Control.

[50]  Karl J. Friston The free-energy principle: a unified brain theory? , 2010, Nature Reviews Neuroscience.

[51]  Martin A. Riedmiller,et al.  Deep auto-encoder neural networks in reinforcement learning , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[52]  Aapo Hyvärinen,et al.  Noise-contrastive estimation: A new estimation principle for unnormalized statistical models , 2010, AISTATS.

[53]  Marc Toussaint,et al.  Approximate Inference and Stochastic Optimal Control , 2010, ArXiv.

[54]  Daniel A. Braun,et al.  A Minimum Relative Entropy Principle for Learning and Acting , 2008, J. Artif. Intell. Res..

[55]  Yi Sun,et al.  Planning to Be Surprised: Optimal Bayesian Exploration in Dynamic Environments , 2011, AGI.

[56]  Daniel A. Braun,et al.  Information, Utility and Bounded Rationality , 2011, AGI.

[57]  Sander Greenland,et al.  Causal Diagrams , 2011, International Encyclopedia of Statistical Science.

[58]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[59]  Karl J. Friston,et al.  What is value—accumulated reward or evidence? , 2012, Front. Neurorobot..

[60]  Vicenç Gómez,et al.  Optimal control as a graphical model inference problem , 2009, Machine Learning.

[61]  Christoph Salge,et al.  Approximation of Empowerment in the continuous Domain , 2013, Adv. Complex Syst..

[62]  Karl J. Friston Life as we know it , 2013, Journal of The Royal Society Interface.

[63]  Tianqi Chen,et al.  Dynamical Decomposition of Markov Processes without Detailed Balance , 2013 .

[64]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[65]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[66]  Christoph Salge,et al.  Changing the Environment Based on Empowerment as Intrinsic Motivation , 2014, Entropy.

[67]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[68]  Tianqi Chen,et al.  A Complete Recipe for Stochastic Gradient MCMC , 2015, NIPS.

[69]  Karl J. Friston,et al.  Active inference and epistemic value , 2015, Cognitive neuroscience.

[70]  Uri Shalit,et al.  Deep Kalman Filters , 2015, ArXiv.

[71]  Shakir Mohamed,et al.  Variational Information Maximisation for Intrinsically Motivated Reinforcement Learning , 2015, NIPS.

[72]  Julien Cornebise,et al.  Weight Uncertainty in Neural Networks , 2015, ArXiv.

[73]  Filip De Turck,et al.  VIME: Variational Information Maximizing Exploration , 2016, NIPS.

[74]  Roy Fox,et al.  Taming the Noise in Reinforcement Learning via Soft Updates , 2015, UAI.

[75]  Benjamin Van Roy,et al.  Deep Exploration via Bootstrapped DQN , 2016, NIPS.

[76]  Max Welling,et al.  Structured and Efficient Variational Deep Learning with Matrix Gaussian Posteriors , 2016, ICML.

[77]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[78]  Tom Schaul,et al.  Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.

[79]  Samy Bengio,et al.  Generating Sentences from a Continuous Space , 2015, CoNLL.

[80]  Karl J. Friston,et al.  Active Inference: A Process Theory , 2017, Neural Computation.

[81]  Lawrence Carin,et al.  Learning Structured Weight Uncertainty in Bayesian Neural Networks , 2017, AISTATS.

[82]  Alexei A. Efros,et al.  Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[83]  Pieter Abbeel,et al.  Stochastic Neural Networks for Hierarchical Reinforcement Learning , 2016, ICLR.

[84]  Maximilian Karl,et al.  Deep Variational Bayes Filters: Unsupervised Learning of State Space Models from Raw Data , 2016, ICLR.

[85]  Sergey Levine,et al.  Reinforcement Learning with Deep Energy-Based Policies , 2017, ICML.

[86]  Daan Wierstra,et al.  Variational Intrinsic Control , 2016, ICLR.

[87]  Max Welling,et al.  Multiplicative Normalizing Flows for Variational Bayesian Neural Networks , 2017, ICML.

[88]  Sergey Levine,et al.  Self-Supervised Visual Planning with Temporal Skip Connections , 2017, CoRL.

[89]  Yee Whye Teh,et al.  Distral: Robust multitask reinforcement learning , 2017, NIPS.

[90]  Pieter Abbeel,et al.  Equivalence Between Policy Gradients and Soft Q-Learning , 2017, ArXiv.

[91]  Tom Schaul,et al.  Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.

[92]  Lawrence Carin,et al.  ALICE: Towards Understanding Adversarial Learning for Joint Distribution Matching , 2017, NIPS.

[93]  Ryota Kanai,et al.  A unified strategy for implementing curiosity and empowerment driven reinforcement learning , 2018, ArXiv.

[94]  Kamyar Azizzadenesheli,et al.  Efficient Exploration Through Bayesian Deep Q-Networks , 2018, 2018 Information Theory and Applications Workshop (ITA).

[95]  Guodong Zhang,et al.  Noisy Natural Gradient as Variational Inference , 2017, ICML.

[96]  Sergey Levine,et al.  Visual Foresight: Model-Based Deep Reinforcement Learning for Vision-Based Robotic Control , 2018, ArXiv.

[97]  Karol Hausman,et al.  Learning an Embedding Space for Transferable Robot Skills , 2018, ICLR.

[98]  Dustin Tran,et al.  Flipout: Efficient Pseudo-Independent Weight Perturbations on Mini-Batches , 2018, ICLR.

[99]  Pieter Abbeel,et al.  Variational Option Discovery Algorithms , 2018, ArXiv.

[100]  Sergey Levine,et al.  Reinforcement Learning and Control as Probabilistic Inference: Tutorial and Review , 2018, ArXiv.

[101]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[102]  Dustin Tran,et al.  Reliable Uncertainty Estimates in Deep Neural Networks using Noise Contrastive Priors , 2018, ArXiv.

[103]  Alexander A. Alemi,et al.  TherML: Thermodynamics of Machine Learning , 2018, ArXiv.

[104]  Oriol Vinyals,et al.  Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.

[105]  Andrew Gordon Wilson,et al.  Averaging Weights Leads to Wider Optima and Better Generalization , 2018, UAI.

[106]  Alexander M. Rush,et al.  Semi-Amortized Variational Autoencoders , 2018, ICML.

[107]  Koray Kavukcuoglu,et al.  Neural scene representation and rendering , 2018, Science.

[108]  Rémi Munos,et al.  Neural Predictive Belief Representations , 2018, ArXiv.

[109]  Jürgen Schmidhuber,et al.  World Models , 2018, ArXiv.

[110]  Rémi Munos,et al.  World Discovery Models , 2019, ArXiv.

[111]  Aaron van den Oord,et al.  Shaping Belief States with Generative Environment Models for RL , 2019, NeurIPS.

[112]  Alexander A. Alemi,et al.  On Variational Bounds of Mutual Information , 2019, ICML.

[113]  Jordi Grau-Moya,et al.  A Unified Bellman Optimality Principle Combining Reward Maximization and Empowerment , 2019, NeurIPS.

[114]  Amos J. Storkey,et al.  Exploration by Random Network Distillation , 2018, ICLR.

[115]  Yee Whye Teh,et al.  Information asymmetry in KL-regularized RL , 2019, ICLR.

[116]  Sergey Levine,et al.  Diversity is All You Need: Learning Skills without a Reward Function , 2018, ICLR.

[117]  Guodong Zhang,et al.  Functional Variational Bayesian Neural Networks , 2019, ICLR.

[118]  Wojciech Jaskowski,et al.  Model-Based Active Exploration , 2018, ICML.

[119]  S. Levine,et al.  SMiRL: Surprise Minimizing Reinforcement Learning in Dynamic Environments , 2019 .

[120]  Patrick van der Smagt,et al.  Unsupervised Real-Time Control Through Variational Empowerment , 2017, ISRR.

[121]  Yee Whye Teh,et al.  Exploiting Hierarchy for Learning and Transfer in KL-regularized RL , 2019, ArXiv.

[122]  Soumya Ghosh,et al.  Model Selection in Bayesian Neural Networks via Horseshoe Priors , 2017, J. Mach. Learn. Res..

[123]  Karl J. Friston A free energy principle for a particular physics , 2019, 1906.10184.

[124]  Richard Zemel,et al.  A Divergence Minimization Perspective on Imitation Learning Methods , 2019, CoRL.

[125]  Sergey Levine,et al.  SMiRL: Surprise Minimizing RL in Dynamic Environments , 2019, ArXiv.

[126]  Sergey Levine,et al.  Efficient Exploration via State Marginal Matching , 2019, ArXiv.

[127]  Karl J. Friston,et al.  Generalised free energy and active inference , 2019, Biological Cybernetics.

[128]  Hod Lipson,et al.  Ensemble Model Patching: A Parameter-Efficient Variational Bayesian Neural Network , 2019, ArXiv.

[129]  Sergey Levine,et al.  SOLAR: Deep Structured Representations for Model-Based Reinforcement Learning , 2018, ICML.

[130]  Justin Bayer,et al.  Approximate Bayesian inference in spatial environments , 2018, Robotics: Science and Systems.

[131]  Dustin Tran,et al.  Bayesian Layers: A Module for Neural Network Uncertainty , 2018, NeurIPS.

[132]  Marc Pollefeys,et al.  Episodic Curiosity through Reachability , 2018, ICLR.

[133]  Ruben Villegas,et al.  Learning Latent Dynamics for Planning from Pixels , 2018, ICML.

[134]  Yarin Gal,et al.  Unpacking Information Bottlenecks: Unifying Information-Theoretic Objectives in Deep Learning , 2020, ArXiv.

[135]  Jimmy Ba,et al.  Dream to Control: Learning Behaviors by Latent Imagination , 2019, ICLR.

[136]  Qing Tang,et al.  Exploration Entropy for Reinforcement Learning , 2020 .

[137]  Michael W. Dusenberry,et al.  Efficient and Scalable Bayesian Neural Nets with Rank-1 Factors , 2020, ICML.

[138]  Geoffrey E. Hinton,et al.  A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[139]  Pieter Abbeel,et al.  Planning to Explore via Self-Supervised World Models , 2020, ICML.

[140]  Michael Tschannen,et al.  On Mutual Information Maximization for Representation Learning , 2019, ICLR.

[141]  Pieter Abbeel,et al.  Efficient Online Estimation of Empowerment for Reinforcement Learning , 2020, ArXiv.

[142]  Ian Osband,et al.  Making Sense of Reinforcement Learning and Probabilistic Inference , 2020, ICLR.

[143]  Sergey Levine,et al.  Stochastic Latent Actor-Critic: Deep Reinforcement Learning with a Latent Variable Model , 2019, NeurIPS.

[144]  Sergey Levine,et al.  Skew-Fit: State-Covering Self-Supervised Reinforcement Learning , 2019, ICML.

[145]  Karl J. Friston,et al.  Active inference on discrete state-spaces: A synthesis , 2020, Journal of mathematical psychology.

[146]  Tadahiro Taniguchi,et al.  PlaNet of the Bayesians: Reconsidering and Improving Deep Planning Network by Incorporating Bayesian Inference , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[147]  Abhinav Gupta,et al.  Learning Robot Skills with Temporal Variational Inference , 2020, ICML.

[148]  Ian S. Fischer,et al.  The Conditional Entropy Bottleneck , 2020, Entropy.

[149]  Sergey Levine,et al.  Dynamics-Aware Unsupervised Discovery of Skills , 2019, ICLR.

[150]  Karl J. Friston,et al.  Sophisticated Inference , 2020, Neural Computation.

[151]  M. Bauer,et al.  Improving predictions of Bayesian neural networks via local linearization , 2020, ArXiv.

[152]  Joelle Pineau,et al.  Improving Sample Efficiency in Model-Free Reinforcement Learning from Images , 2019, AAAI.