Generalization to New Actions in Reinforcement Learning

A fundamental trait of intelligence is the ability to achieve goals in the face of novel circumstances, such as making decisions from new action choices. However, standard reinforcement learning assumes a fixed set of actions and requires expensive retraining when given a new action set. To make learning agents more adaptable, we introduce the problem of zero-shot generalization to new actions. We propose a two-stage framework where the agent first infers action representations from action information acquired separately from the task. A policy flexible to varying action sets is then trained with generalization objectives. We benchmark generalization on sequential tasks, such as selecting from an unseen tool-set to solve physical reasoning puzzles and stacking towers with novel 3D shapes. Videos and code are available at this https URL

[1]  Alexei A. Efros,et al.  Learning to Control Self-Assembling Morphologies: A Study of Generalization via Modularity , 2019, NeurIPS.

[2]  Douglas M. Hawkins,et al.  The Problem of Overfitting , 2004, J. Chem. Inf. Model..

[3]  Richard Evans,et al.  Deep Reinforcement Learning in Large Discrete Action Spaces , 2015, 1512.07679.

[4]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[5]  Christopher Burgess,et al.  DARLA: Improving Zero-Shot Transfer in Reinforcement Learning , 2017, ICML.

[6]  Sergey Levine,et al.  Improvisation through Physical Understanding: Using Novel Objects as Tools with Visual Foresight , 2019, Robotics: Science and Systems.

[7]  Joshua B. Tenenbaum,et al.  The Tools Challenge: Rapid Trial-and-Error Learning in Physical Problem Solving , 2019, CogSci.

[8]  Samuel Gershman,et al.  Novelty and Inductive Generalization in Human Reinforcement Learning , 2015, Top. Cogn. Sci..

[9]  Sebastian Scherer,et al.  Improving Stochastic Policy Gradients in Continuous Control with Deep Reinforcement Learning using the Beta Distribution , 2017, ICML.

[10]  Silvio Savarese,et al.  Learning task-oriented grasping for tool manipulation from simulated self-supervision , 2018, Robotics: Science and Systems.

[11]  Sergey Levine,et al.  Visual Reinforcement Learning with Imagined Goals , 2018, NeurIPS.

[12]  Raia Hadsell,et al.  Graph networks as learnable physics engines for inference and control , 2018, ICML.

[13]  Honglak Lee,et al.  Zero-Shot Task Generalization with Multi-Task Deep Reinforcement Learning , 2017, ICML.

[14]  Nicolò Cesa-Bianchi,et al.  Advances in Neural Information Processing Systems 31 , 2018, NIPS 2018.

[15]  Silvio Savarese,et al.  Neural Task Programming: Learning to Generalize Across Hierarchical Tasks , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[16]  Jianfeng Gao,et al.  Deep Reinforcement Learning with a Natural Language Action Space , 2015, ACL.

[17]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[18]  Amos J. Storkey,et al.  Towards a Neural Statistician , 2016, ICLR.

[19]  Yu Yang,et al.  Learning Action-Transferable Policy with Action Embedding , 2019, ArXiv.

[20]  Sergey Levine,et al.  EMI: Exploration with Mutual Information , 2018, ICML.

[21]  Oriol Vinyals,et al.  Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.

[22]  Philip S. Thomas,et al.  Learning Action Representations for Reinforcement Learning , 2019, ICML.

[23]  Alexandros Karatzoglou,et al.  RecoGym: A Reinforcement Learning Environment for the problem of Product Recommendation in Online Advertising , 2018, ArXiv.

[24]  Pierre-Yves Oudeyer,et al.  Curiosity Driven Exploration of Learned Disentangled Goal Spaces , 2018, CoRL.

[25]  Taehoon Kim,et al.  Quantifying Generalization in Reinforcement Learning , 2018, ICML.

[26]  Ross B. Girshick,et al.  PHYRE: A New Benchmark for Physical Reasoning , 2019, NeurIPS.

[27]  Sergey Levine,et al.  Self-Consistent Trajectory Autoencoder: Hierarchical Reinforcement Learning with Trajectory Embeddings , 2018, ICML.

[28]  Dan Klein,et al.  Modular Multitask Reinforcement Learning with Policy Sketches , 2016, ICML.

[29]  Vighnesh Birodkar,et al.  Unsupervised Learning of Disentangled Representations from Video , 2017, NIPS.

[30]  John Schulman,et al.  Gotta Learn Fast: A New Benchmark for Generalization in RL , 2018, ArXiv.

[31]  Shane Legg,et al.  Universal Intelligence: A Definition of Machine Intelligence , 2007, Minds and Machines.

[32]  Christopher Burgess,et al.  beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.

[33]  Sanja Fidler,et al.  NerveNet: Learning Structured Policy with Graph Neural Networks , 2018, ICLR.

[34]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[35]  Tim Verbelen,et al.  Improving Generalization for Abstract Reasoning Tasks Using Disentangled Feature Representations , 2018, NIPS 2018.

[36]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[37]  King-Sun Fu,et al.  IEEE Transactions on Pattern Analysis and Machine Intelligence Publication Information , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[39]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[40]  Dawn Xiaodong Song,et al.  Assessing Generalization in Deep Reinforcement Learning , 2018, ArXiv.

[41]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[42]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[43]  Peter Stone,et al.  Deep Reinforcement Learning in Parameterized Action Space , 2015, ICLR.

[44]  Shie Mannor,et al.  The Natural Language of Actions , 2019, ICML.

[45]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[46]  Sergey Levine,et al.  Self-Supervised Visual Planning with Temporal Skip Connections , 2017, CoRL.

[47]  Stefan Wermter,et al.  Continual Lifelong Learning with Neural Networks: A Review , 2019, Neural Networks.

[48]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[49]  Anind K. Dey,et al.  Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[50]  Philip S. Thomas,et al.  Lifelong Learning with a Changing Action Set , 2019, AAAI.

[51]  Liyuan Liu,et al.  On the Variance of the Adaptive Learning Rate and Beyond , 2019, ICLR.

[52]  Gábor Lugosi,et al.  Introduction to Statistical Learning Theory , 2004, Advanced Lectures on Machine Learning.

[53]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.