论文信息 - Joint Perception and Control as Inference with an Object-based Implementation.

Joint Perception and Control as Inference with an Object-based Implementation.

Existing model-based reinforcement learning methods often study perception modeling and decision making separately. We introduce joint Perception and Control as Inference (PCI), a general framework to combine perception and control for partially observable environments through Bayesian inference. Based on the fact that object-level inductive biases are critical in human perceptual learning and reasoning, we propose Object-based Perception Control (OPC), an instantiation of PCI which manages to facilitate control using automatic discovered object-based representations. We develop an unsupervised end-to-end solution and analyze the convergence of the perception model update. Experiments in a high-dimensional pixel environment demonstrate the learning effectiveness of our object-based perception control approach. Specifically, we show that OPC achieves good perceptual grouping quality and outperforms several strong baselines in accumulated rewards.

[1] Jessica B. Hamrick,et al. Simulation as an engine of physical scene understanding , 2013, Proceedings of the National Academy of Sciences.

[2] Sergey Levine,et al. End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[3] Qing Tang,et al. Exploration Entropy for Reinforcement Learning , 2020 .

[4] Daniel A. Braun,et al. Information, Utility and Bounded Rationality , 2011, AGI.

[5] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[6] R. A. Leibler,et al. On Information and Sufficiency , 1951 .

[7] Joshua B. Tenenbaum,et al. Building machines that learn and think like people , 2016, Behavioral and Brain Sciences.

[8] Vicenç Gómez,et al. Optimal control as a graphical model inference problem , 2009, Machine Learning.

[9] Sergey Levine,et al. Reinforcement Learning and Control as Probabilistic Inference: Tutorial and Review , 2018, ArXiv.

[10] Guy Shani,et al. Model-Based Online Learning of POMDPs , 2005, ECML.

[11] Martin A. Riedmiller,et al. Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images , 2015, NIPS.

[12] Karl J. Friston. The free-energy principle: a unified brain theory? , 2010, Nature Reviews Neuroscience.

[13] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[14] Pascal Poupart,et al. Unsupervised Video Object Segmentation for Deep Reinforcement Learning , 2018, NeurIPS.

[15] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[16] Razvan Pascanu,et al. Relational Deep Reinforcement Learning , 2018, ArXiv.

[17] Sergey Levine,et al. Deep visual foresight for planning robot motion , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[18] Pieter Abbeel,et al. InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets , 2016, NIPS.

[19] Jürgen Schmidhuber,et al. Recurrent World Models Facilitate Policy Evolution , 2018, NeurIPS.

[20] Andre Cohen,et al. An object-oriented representation for efficient reinforcement learning , 2008, ICML '08.

[21] Sergey Levine,et al. Continuous Deep Q-Learning with Model-based Acceleration , 2016, ICML.

[22] Karl J. Friston,et al. A theory of cortical responses , 2005, Philosophical Transactions of the Royal Society B: Biological Sciences.

[23] Karl J. Friston,et al. Action and Perception as Divergence Minimization , 2020, ArXiv.

[24] Joshua B. Tenenbaum,et al. Human-level concept learning through probabilistic program induction , 2015, Science.

[25] Marc Toussaint,et al. Approximate Inference and Stochastic Optimal Control , 2010, ArXiv.

[26] E. Spelke,et al. Origins of knowledge. , 1992, Psychological review.

[27] S. Dehaene,et al. What is consciousness, and could machines have it? , 2017, Science.

[28] Jürgen Schmidhuber,et al. Neural Expectation Maximization , 2017, NIPS.

[29] Chongjie Zhang,et al. Object-Oriented Dynamics Predictor , 2018, NeurIPS.

[30] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[31] Stuart A. Kauffman,et al. The origins of order , 1993 .

[32] Sergey Levine,et al. Stochastic Latent Actor-Critic: Deep Reinforcement Learning with a Latent Variable Model , 2019, NeurIPS.

[33] Shimon Whiteson,et al. Deep Variational Reinforcement Learning for POMDPs , 2018, ICML.

[34] Emma Brunskill,et al. Strategic Object Oriented Reinforcement Learning , 2018, ArXiv.

[35] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[36] Sergey Levine,et al. Efficient Exploration via State Marginal Matching , 2019, ArXiv.

[37] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[38] B. Lake. Towards more human-like concept learning in machines : compositionality, causality, and learning-to-learn , 2014 .

[39] Ankush Gupta,et al. Unsupervised Learning of Object Keypoints for Perception and Control , 2019, NeurIPS.

[40] Klaus Greff,et al. Multi-Object Representation Learning with Iterative Variational Inference , 2019, ICML.

[41] Murray Shanahan,et al. Towards Deep Symbolic Reinforcement Learning , 2016, ArXiv.

[42] Emanuel Todorov,et al. General duality between optimal control and estimation , 2008, 2008 47th IEEE Conference on Decision and Control.

[43] PAUL J. WERBOS,et al. Generalization of backpropagation with application to a recurrent gas market model , 1988, Neural Networks.

[44] Alexander Lerchner,et al. COBRA: Data-Efficient Model-Based RL through Unsupervised Object Discovery and Curiosity-Driven Exploration , 2019, ArXiv.

[45] Jürgen Schmidhuber,et al. Relational Neural Expectation Maximization: Unsupervised Discovery of Objects and their Interactions , 2018, ICLR.

[46] Jürgen Schmidhuber,et al. Learning Factorial Codes by Predictability Minimization , 1992, Neural Computation.

[47] Tai Sing Lee,et al. Hierarchical Bayesian inference in the visual cortex. , 2003, Journal of the Optical Society of America. A, Optics, image science, and vision.

[48] Karl J. Friston,et al. Generalised free energy and active inference , 2018, Biological Cybernetics.

[49] Chris L. Baker,et al. Rational quantitative attribution of beliefs, desires and percepts in human mentalizing , 2017, Nature Human Behaviour.

[50] Leslie Pack Kaelbling,et al. Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[51] New York Dover,et al. ON THE CONVERGENCE PROPERTIES OF THE EM ALGORITHM , 1983 .