Joint Perception and Control as Inference with an Object-based Implementation.

Existing model-based reinforcement learning methods often study perception modeling and decision making separately. We introduce joint Perception and Control as Inference (PCI), a general framework to combine perception and control for partially observable environments through Bayesian inference. Based on the fact that object-level inductive biases are critical in human perceptual learning and reasoning, we propose Object-based Perception Control (OPC), an instantiation of PCI which manages to facilitate control using automatic discovered object-based representations. We develop an unsupervised end-to-end solution and analyze the convergence of the perception model update. Experiments in a high-dimensional pixel environment demonstrate the learning effectiveness of our object-based perception control approach. Specifically, we show that OPC achieves good perceptual grouping quality and outperforms several strong baselines in accumulated rewards.

[1]  Jessica B. Hamrick,et al.  Simulation as an engine of physical scene understanding , 2013, Proceedings of the National Academy of Sciences.

[2]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[3]  Qing Tang,et al.  Exploration Entropy for Reinforcement Learning , 2020 .

[4]  Daniel A. Braun,et al.  Information, Utility and Bounded Rationality , 2011, AGI.

[5]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[6]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[7]  Joshua B. Tenenbaum,et al.  Building machines that learn and think like people , 2016, Behavioral and Brain Sciences.

[8]  Vicenç Gómez,et al.  Optimal control as a graphical model inference problem , 2009, Machine Learning.

[9]  Sergey Levine,et al.  Reinforcement Learning and Control as Probabilistic Inference: Tutorial and Review , 2018, ArXiv.

[10]  Guy Shani,et al.  Model-Based Online Learning of POMDPs , 2005, ECML.

[11]  Martin A. Riedmiller,et al.  Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images , 2015, NIPS.

[12]  Karl J. Friston The free-energy principle: a unified brain theory? , 2010, Nature Reviews Neuroscience.

[13]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[14]  Pascal Poupart,et al.  Unsupervised Video Object Segmentation for Deep Reinforcement Learning , 2018, NeurIPS.

[15]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[16]  Razvan Pascanu,et al.  Relational Deep Reinforcement Learning , 2018, ArXiv.

[17]  Sergey Levine,et al.  Deep visual foresight for planning robot motion , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[18]  Pieter Abbeel,et al.  InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets , 2016, NIPS.

[19]  Jürgen Schmidhuber,et al.  Recurrent World Models Facilitate Policy Evolution , 2018, NeurIPS.

[20]  Andre Cohen,et al.  An object-oriented representation for efficient reinforcement learning , 2008, ICML '08.

[21]  Sergey Levine,et al.  Continuous Deep Q-Learning with Model-based Acceleration , 2016, ICML.

[22]  Karl J. Friston,et al.  A theory of cortical responses , 2005, Philosophical Transactions of the Royal Society B: Biological Sciences.

[23]  Karl J. Friston,et al.  Action and Perception as Divergence Minimization , 2020, ArXiv.

[24]  Joshua B. Tenenbaum,et al.  Human-level concept learning through probabilistic program induction , 2015, Science.

[25]  Marc Toussaint,et al.  Approximate Inference and Stochastic Optimal Control , 2010, ArXiv.

[26]  E. Spelke,et al.  Origins of knowledge. , 1992, Psychological review.

[27]  S. Dehaene,et al.  What is consciousness, and could machines have it? , 2017, Science.

[28]  Jürgen Schmidhuber,et al.  Neural Expectation Maximization , 2017, NIPS.

[29]  Chongjie Zhang,et al.  Object-Oriented Dynamics Predictor , 2018, NeurIPS.

[30]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[31]  Stuart A. Kauffman,et al.  The origins of order , 1993 .

[32]  Sergey Levine,et al.  Stochastic Latent Actor-Critic: Deep Reinforcement Learning with a Latent Variable Model , 2019, NeurIPS.

[33]  Shimon Whiteson,et al.  Deep Variational Reinforcement Learning for POMDPs , 2018, ICML.

[34]  Emma Brunskill,et al.  Strategic Object Oriented Reinforcement Learning , 2018, ArXiv.

[35]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[36]  Sergey Levine,et al.  Efficient Exploration via State Marginal Matching , 2019, ArXiv.

[37]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[38]  B. Lake Towards more human-like concept learning in machines : compositionality, causality, and learning-to-learn , 2014 .

[39]  Ankush Gupta,et al.  Unsupervised Learning of Object Keypoints for Perception and Control , 2019, NeurIPS.

[40]  Klaus Greff,et al.  Multi-Object Representation Learning with Iterative Variational Inference , 2019, ICML.

[41]  Murray Shanahan,et al.  Towards Deep Symbolic Reinforcement Learning , 2016, ArXiv.

[42]  Emanuel Todorov,et al.  General duality between optimal control and estimation , 2008, 2008 47th IEEE Conference on Decision and Control.

[43]  PAUL J. WERBOS,et al.  Generalization of backpropagation with application to a recurrent gas market model , 1988, Neural Networks.

[44]  Alexander Lerchner,et al.  COBRA: Data-Efficient Model-Based RL through Unsupervised Object Discovery and Curiosity-Driven Exploration , 2019, ArXiv.

[45]  Jürgen Schmidhuber,et al.  Relational Neural Expectation Maximization: Unsupervised Discovery of Objects and their Interactions , 2018, ICLR.

[46]  Jürgen Schmidhuber,et al.  Learning Factorial Codes by Predictability Minimization , 1992, Neural Computation.

[47]  Tai Sing Lee,et al.  Hierarchical Bayesian inference in the visual cortex. , 2003, Journal of the Optical Society of America. A, Optics, image science, and vision.

[48]  Karl J. Friston,et al.  Generalised free energy and active inference , 2018, Biological Cybernetics.

[49]  Chris L. Baker,et al.  Rational quantitative attribution of beliefs, desires and percepts in human mentalizing , 2017, Nature Human Behaviour.

[50]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[51]  New York Dover,et al.  ON THE CONVERGENCE PROPERTIES OF THE EM ALGORITHM , 1983 .