论文信息 - Generalization to New Actions in Reinforcement Learning

Generalization to New Actions in Reinforcement Learning

A fundamental trait of intelligence is the ability to achieve goals in the face of novel circumstances, such as making decisions from new action choices. However, standard reinforcement learning assumes a fixed set of actions and requires expensive retraining when given a new action set. To make learning agents more adaptable, we introduce the problem of zero-shot generalization to new actions. We propose a two-stage framework where the agent first infers action representations from action information acquired separately from the task. A policy flexible to varying action sets is then trained with generalization objectives. We benchmark generalization on sequential tasks, such as selecting from an unseen tool-set to solve physical reasoning puzzles and stacking towers with novel 3D shapes. Videos and code are available at this https URL

Joseph J. Lim | Ayush Jain | Andrew Szot | Andrew Szot | Ayush Jain

[1] Alexei A. Efros,et al. Learning to Control Self-Assembling Morphologies: A Study of Generalization via Modularity , 2019, NeurIPS.

[2] Douglas M. Hawkins,et al. The Problem of Overfitting , 2004, J. Chem. Inf. Model..

[3] Richard Evans,et al. Deep Reinforcement Learning in Large Discrete Action Spaces , 2015, 1512.07679.

[4] Kuldip K. Paliwal,et al. Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[5] Christopher Burgess,et al. DARLA: Improving Zero-Shot Transfer in Reinforcement Learning , 2017, ICML.

[6] Sergey Levine,et al. Improvisation through Physical Understanding: Using Novel Objects as Tools with Visual Foresight , 2019, Robotics: Science and Systems.

[7] Joshua B. Tenenbaum,et al. The Tools Challenge: Rapid Trial-and-Error Learning in Physical Problem Solving , 2019, CogSci.

[8] Samuel Gershman,et al. Novelty and Inductive Generalization in Human Reinforcement Learning , 2015, Top. Cogn. Sci..

[9] Sebastian Scherer,et al. Improving Stochastic Policy Gradients in Continuous Control with Deep Reinforcement Learning using the Beta Distribution , 2017, ICML.

[10] Silvio Savarese,et al. Learning task-oriented grasping for tool manipulation from simulated self-supervision , 2018, Robotics: Science and Systems.

[11] Sergey Levine,et al. Visual Reinforcement Learning with Imagined Goals , 2018, NeurIPS.

[12] Raia Hadsell,et al. Graph networks as learnable physics engines for inference and control , 2018, ICML.

[13] Honglak Lee,et al. Zero-Shot Task Generalization with Multi-Task Deep Reinforcement Learning , 2017, ICML.

[14] Nicolò Cesa-Bianchi,et al. Advances in Neural Information Processing Systems 31 , 2018, NIPS 2018.

[15] Silvio Savarese,et al. Neural Task Programming: Learning to Generalize Across Hierarchical Tasks , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[16] Jianfeng Gao,et al. Deep Reinforcement Learning with a Natural Language Action Space , 2015, ACL.

[17] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[18] Amos J. Storkey,et al. Towards a Neural Statistician , 2016, ICLR.

[19] Yu Yang,et al. Learning Action-Transferable Policy with Action Embedding , 2019, ArXiv.

[20] Sergey Levine,et al. EMI: Exploration with Mutual Information , 2018, ICML.

[21] Oriol Vinyals,et al. Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.

[22] Philip S. Thomas,et al. Learning Action Representations for Reinforcement Learning , 2019, ICML.

[23] Alexandros Karatzoglou,et al. RecoGym: A Reinforcement Learning Environment for the problem of Product Recommendation in Online Advertising , 2018, ArXiv.

[24] Pierre-Yves Oudeyer,et al. Curiosity Driven Exploration of Learned Disentangled Goal Spaces , 2018, CoRL.

[25] Taehoon Kim,et al. Quantifying Generalization in Reinforcement Learning , 2018, ICML.

[26] Ross B. Girshick,et al. PHYRE: A New Benchmark for Physical Reasoning , 2019, NeurIPS.

[27] Sergey Levine,et al. Self-Consistent Trajectory Autoencoder: Hierarchical Reinforcement Learning with Trajectory Embeddings , 2018, ICML.

[28] Dan Klein,et al. Modular Multitask Reinforcement Learning with Policy Sketches , 2016, ICML.

[29] Vighnesh Birodkar,et al. Unsupervised Learning of Disentangled Representations from Video , 2017, NIPS.

[30] John Schulman,et al. Gotta Learn Fast: A New Benchmark for Generalization in RL , 2018, ArXiv.

[31] Shane Legg,et al. Universal Intelligence: A Definition of Machine Intelligence , 2007, Minds and Machines.

[32] Christopher Burgess,et al. beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.

[33] Sanja Fidler,et al. NerveNet: Learning Structured Policy with Graph Neural Networks , 2018, ICLR.

[34] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[35] Tim Verbelen,et al. Improving Generalization for Abstract Reasoning Tasks Using Disentangled Feature Representations , 2018, NIPS 2018.

[36] Vladimir N. Vapnik,et al. The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[37] King-Sun Fu,et al. IEEE Transactions on Pattern Analysis and Machine Intelligence Publication Information , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38] Vladimir Vapnik,et al. Statistical learning theory , 1998 .

[39] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.

[40] Dawn Xiaodong Song,et al. Assessing Generalization in Deep Reinforcement Learning , 2018, ArXiv.

[41] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[42] Sergey Levine,et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[43] Peter Stone,et al. Deep Reinforcement Learning in Parameterized Action Space , 2015, ICLR.

[44] Shie Mannor,et al. The Natural Language of Actions , 2019, ICML.

[45] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .

[46] Sergey Levine,et al. Self-Supervised Visual Planning with Temporal Skip Connections , 2017, CoRL.

[47] Stefan Wermter,et al. Continual Lifelong Learning with Neural Networks: A Review , 2019, Neural Networks.

[48] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[49] Anind K. Dey,et al. Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[50] Philip S. Thomas,et al. Lifelong Learning with a Changing Action Set , 2019, AAAI.

[51] Liyuan Liu,et al. On the Variance of the Adaptive Learning Rate and Beyond , 2019, ICLR.

[52] Gábor Lugosi,et al. Introduction to Statistical Learning Theory , 2004, Advanced Lectures on Machine Learning.

[53] Pascal Vincent,et al. Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.