论文信息 - Learning and Planning with a Semantic Model

Learning and Planning with a Semantic Model

Building deep reinforcement learning agents that can generalize and adapt to unseen environments remains a fundamental challenge for AI. This paper describes progresses on this challenge in the context of man-made environments, which are visually diverse but contain intrinsic semantic regularities. We propose a hybrid model-based and model-free approach, LEArning and Planning with Semantics (LEAPS), consisting of a multi-target sub-policy that acts on visual inputs, and a Bayesian model over semantic structures. When placed in an unseen environment, the agent plans with the semantic model to make high-level decisions, proposes the next sub-target for the sub-policy to execute, and updates the semantic model based on new observations. We perform experiments in visual navigation tasks using House3D, a 3D environment that contains diverse human-designed indoor scenes with real-world objects. LEAPS outperforms strong baselines that do not explicitly plan using the semantic content.

[1] Rob Fergus,et al. Composable Planning with Attributes , 2018, ICML.

[2] Razvan Pascanu,et al. Learning to Navigate in Complex Environments , 2016, ICLR.

[3] Raia Hadsell,et al. Learning to Navigate in Cities Without a Map , 2018, NeurIPS.

[4] Sergey Levine,et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[5] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[6] Petra Holtzmann. Directed Sonar Sensing For Mobile Robot Navigation , 2016 .

[7] Peter L. Bartlett,et al. RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning , 2016, ArXiv.

[8] Tom Schaul,et al. Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.

[9] Doina Precup,et al. The Option-Critic Architecture , 2016, AAAI.

[10] Ali Farhadi,et al. Target-driven visual navigation in indoor scenes using deep reinforcement learning , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[11] Honglak Lee,et al. Zero-Shot Task Generalization with Multi-Task Deep Reinforcement Learning , 2017, ICML.

[12] Martin A. Riedmiller,et al. Learning by Playing - Solving Sparse Reward Tasks from Scratch , 2018, ICML.

[13] Pieter Abbeel,et al. Meta-Learning with Temporal Convolutions , 2017, ArXiv.

[14] Joshua B. Tenenbaum,et al. Building machines that learn and think like people , 2016, Behavioral and Brain Sciences.

[15] Sergey Levine,et al. Self-Supervised Visual Planning with Temporal Skip Connections , 2017, CoRL.

[16] Sergey Levine,et al. End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[17] Sebastian Thrun,et al. Probabilistic robotics , 2002, CACM.

[18] Ruslan Salakhutdinov,et al. Neural Map: Structured Memory for Deep Reinforcement Learning , 2017, ICLR.

[19] Shie Mannor,et al. Contextual Markov Decision Processes , 2015, ArXiv.

[20] Dileep George,et al. Schema Networks: Zero-shot Transfer with a Generative Causal Model of Intuitive Physics , 2017, ICML.

[21] Thomas A. Funkhouser,et al. Semantic Scene Completion from a Single Depth Image , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22] Sergey Levine,et al. Deep visual foresight for planning robot motion , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[23] Kaiming He,et al. Mask R-CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[24] Yuandong Tian,et al. Building Generalizable Agents with a Realistic and Rich 3D Environment , 2018, ICLR.

[25] Hector Geffner,et al. Model-free, Model-based, and General Intelligence , 2018, IJCAI.

[26] Vladlen Koltun,et al. Semi-parametric Topological Memory for Navigation , 2018, ICLR.

[27] Rahul Sukthankar,et al. Cognitive Mapping and Planning for Visual Navigation , 2017, International Journal of Computer Vision.

[28] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[29] Honglak Lee,et al. Action-Conditional Video Prediction using Deep Networks in Atari Games , 2015, NIPS.

[30] Vijay Kumar,et al. Memory Augmented Control Networks , 2017, ICLR.

[31] Pieter Abbeel,et al. Value Iteration Networks , 2016, NIPS.

[32] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[33] Wojciech Zaremba,et al. Domain randomization for transferring deep neural networks from simulation to the real world , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[34] Tom Schaul,et al. FeUdal Networks for Hierarchical Reinforcement Learning , 2017, ICML.