Guiding Safe Reinforcement Learning Policies Using Structured Language Constraints

Reinforcement learning (RL) has shown success in solving complex sequential decision making tasks when a well defined reward function is available. For agents acting in the real world, these reward functions need to be designed very carefully to make sure the agents act in a safe manner. This is especially true when these agents need to interact with humans and perform tasks in such settings. However, hand-crafting such a reward function often requires specialized expertise and quickly becomes difficult to scale with task-complexity. This leads to the long-standing problem in reinforcement learning known as reward sparsity where sparse or poorly specified reward functions slow down the learning process and lead to sub-optimal policies and unsafe behaviors. To make matters worse, reward functions often need to be adjusted or re-specified for each task the RL agent must learn. On the other-hand, it’s relatively easy for people to specify using language what you should or shouldn’t do in order to do a task safely. Inspired by this, we propose a framework to train RL agents conditioned on constraints that are in the form of structured language, thus reducing effort to design and integrate specialized rewards into the environment. In our experiments, we show that this method can be used to ground the language to behaviors and enable the agent to solve tasks while following the constraints. We also show how the agent can transfer these skills to other tasks.

[1]  Pushmeet Kohli,et al.  Learning to Understand Goal Specifications by Modelling Reward , 2018, ICLR.

[2]  Shane Legg,et al.  Deep Reinforcement Learning from Human Preferences , 2017, NIPS.

[3]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[4]  Tinoosh Mohsenin,et al.  Minimizing Classification Energy of Binarized Neural Network Inference for Wearable Devices , 2019, 20th International Symposium on Quality Electronic Design (ISQED).

[5]  Raymond J. Mooney,et al.  Learning to Interpret Natural Language Navigation Instructions from Observations , 2011, Proceedings of the AAAI Conference on Artificial Intelligence.

[6]  A. Gopnik,et al.  The Development of Categorization in the Second Year and Its Relation to Other Cognitive and Linguistic Developments. , 1987 .

[7]  Joshua B. Tenenbaum,et al.  Human Learning in Atari , 2017, AAAI Spring Symposia.

[8]  John Schulman,et al.  Concrete Problems in AI Safety , 2016, ArXiv.

[9]  W. David Hairston,et al.  A Low Complexity Automated Multi-channel EEG Artifact Detection using EEGNet , 2019 .

[10]  Nicholas Waytowich,et al.  Grounding natural language commands to StarCraft II game states for narration-guided reinforcement learning , 2019, Defense + Commercial Sensing.

[11]  Eduardo F. Morales,et al.  An Introduction to Reinforcement Learning , 2011 .

[12]  Prasoon Goyal,et al.  Using Natural Language for Reward Shaping in Reinforcement Learning , 2019, IJCAI.

[13]  Laurent Orseau,et al.  AI Safety Gridworlds , 2017, ArXiv.

[14]  Owain Evans,et al.  Trial without Error: Towards Safe Reinforcement Learning via Human Intervention , 2017, AAMAS.

[15]  Nicholas Waytowich,et al.  A Narration-based Reward Shaping Approach using Grounded Natural Language Commands , 2019, ArXiv.

[16]  Demis Hassabis,et al.  Grounded Language Learning in a Simulated 3D World , 2017, ArXiv.

[17]  Tim Oates,et al.  On the use of Deep Autoencoders for Efficient Embedded Reinforcement Learning , 2019, ACM Great Lakes Symposium on VLSI.

[18]  Benjamin Kuipers,et al.  Walk the Talk: Connecting Language, Knowledge, and Action in Route Instructions , 2006, AAAI.

[19]  Luke S. Zettlemoyer,et al.  Reinforcement Learning for Mapping Instructions to Actions , 2009, ACL.

[20]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[21]  Tinoosh Mohsenin,et al.  Improving Safety in Reinforcement Learning Using Model-Based Architectures and Human Intervention , 2019, FLAIRS Conference.

[22]  Tim Oates,et al.  Learning Behaviors from a Single Video Demonstration Using Human Feedback , 2019, AAMAS.

[23]  Houman Homayoun,et al.  On the Complexity Reduction of Dense Layers from O(N2) to O(NlogN) with Cyclic Sparsely Connected Layers , 2019, 2019 56th ACM/IEEE Design Automation Conference (DAC).

[24]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .