论文信息 - Interpretable Policy Specification and Synthesis through Natural Language and RL

Interpretable Policy Specification and Synthesis through Natural Language and RL

Policy specification is a process by which a human can initialize a robot’s behaviour and, in turn, warm-start policy optimization via Reinforcement Learning (RL). While policy specification/design is inherently a collaborative process, modern methods based on Learning from Demonstration or Deep RL lack the model interpretability and accessibility to be classified as such. Current state-ofthe-art methods for policy specification rely on black-box models, which are an insufficient means of collaboration for non-expert users: These models provide no means of inspecting policies learnt by the agent and are not focused on creating a usable modality for teaching robot behaviour. In this paper, we propose a novel machine learning framework that enables humans to 1) specify, through natural language, interpretable policies in the form of easy-to-understand decision trees, 2) leverage these policies to warm-start reinforcement learning and 3) outperform baselines that lack our natural language initialization mechanism. We train our approach by collecting a first-of-its-kind corpus mapping free-form natural language policy descriptions to decision tree-based policies. We show that our novel framework translates natural language to decision trees with a 96% and 97% accuracy on a held-out corpus across two domains, respectively. Finally, we validate that policies initialized with natural language commands are able to significantly outperform relevant baselines (p < 0.001) that do not benefit from our natural language-based warm-start technique.

[1] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[2] Stefanie Tellex,et al. Accurately and Efficiently Interpreting Human-Robot Instructions of Varying Granularities , 2017, Robotics: Science and Systems.

[3] Matthew C. Gombolay,et al. ProLoNets: Neural-encoding Human Experts' Domain Knowledge to Warm Start Reinforcement Learning , 2019, ArXiv.

[4] Luke S. Zettlemoyer,et al. Deep Contextualized Word Representations , 2018, NAACL.

[5] Luke S. Zettlemoyer,et al. Learning to Parse Natural Language Commands to a Robot Control System , 2012, ISER.

[6] Jun Du,et al. A GRU-Based Encoder-Decoder Approach with Attention for Online Handwritten Mathematical Expression Recognition , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[7] Pushmeet Kohli,et al. Learning to Understand Goal Specifications by Modelling Reward , 2018, ICLR.

[8] Percy Liang,et al. Data Recombination for Neural Semantic Parsing , 2016, ACL.

[9] Joan Bruna,et al. Intriguing properties of neural networks , 2013, ICLR.

[10] Abhishek Das,et al. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[11] Andrew Zisserman,et al. Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[12] Daniel S. Weld,et al. The challenge of crafting intelligible intelligence , 2018, Commun. ACM.

[13] Hod Lipson,et al. Understanding Neural Networks Through Deep Visualization , 2015, ArXiv.

[14] Peter Stone,et al. Improving Grounded Natural Language Understanding through Human-Robot Dialog , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[15] Luke S. Zettlemoyer,et al. Weakly Supervised Learning of Semantic Parsers for Mapping Instructions to Actions , 2013, TACL.

[16] Tie-Yan Liu,et al. Incorporating BERT into Neural Machine Translation , 2020, ICLR.

[17] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[18] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[19] Wojciech Samek,et al. Methods for interpreting and understanding deep neural networks , 2017, Digit. Signal Process..

[20] Shie Mannor,et al. Graying the black box: Understanding DQNs , 2016, ICML.

[21] Diyi Yang,et al. Hierarchical Attention Networks for Document Classification , 2016, NAACL.

[22] Mirella Lapata,et al. Language to Logical Form with Neural Attention , 2016, ACL.

[23] Matthew R. Walter,et al. Understanding Natural Language Commands for Robotic Navigation and Mobile Manipulation , 2011, AAAI.

[24] J. L. Peterson,et al. Deep Neural Network Initialization With Decision Trees , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[25] Alvin Cheung,et al. Learning a Neural Semantic Parser from User Feedback , 2017, ACL.

[26] Nolan Wagener,et al. Fast Policy Learning through Imitation and Reinforcement , 2018, UAI.

[27] Been Kim,et al. Towards A Rigorous Science of Interpretable Machine Learning , 2017, 1702.08608.

[28] Andrew Bennett,et al. Mapping Instructions to Actions in 3D Environments with Visual Goal Prediction , 2018, EMNLP.

[29] Li Wang,et al. The Robotarium: A remotely accessible swarm robotics research testbed , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[30] Ross A. Knepper,et al. Following High-level Navigation Instructions on a Simulated Quadcopter with Imitation Learning , 2018, Robotics: Science and Systems.

[31] Alberto Suárez,et al. Globally Optimal Fuzzy Decision Trees for Classification and Regression , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[32] Hadas Kress-Gazit,et al. Translating Structured English to Robot Controllers , 2008, Adv. Robotics.

[33] Zachary Chase Lipton. The mythos of model interpretability , 2016, ACM Queue.

[34] Stefanie Tellex,et al. Sequence-to-Sequence Language Grounding of Non-Markovian Task Specifications , 2018, Robotics: Science and Systems.

[35] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.

[36] Maya Cakmak,et al. Power to the People: The Role of Humans in Interactive Machine Learning , 2014, AI Mag..

[37] Yannick Schroecker,et al. Imitating Latent Policies from Observation , 2018, ICML.

[38] Thomas G. Dietterich. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[39] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[40] Ross A. Knepper,et al. Learning to Map Natural Language Instructions to Physical Quadcopter Control using Simulated Flight , 2019, CoRL.

[41] Bo He,et al. Human-Centered Reinforcement Learning: A Survey , 2019, IEEE Transactions on Human-Machine Systems.

[42] Pieter Abbeel,et al. Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[43] Mark O. Riedl,et al. Human-centered Explainable AI: Towards a Reflective Sociotechnical Approach , 2020, HCI.

[44] Sung-Hyun Son,et al. Optimization Methods for Interpretable Differentiable Decision Trees Applied to Reinforcement Learning , 2020, AISTATS.

[45] Vivian Chu,et al. Benchmark for Skill Learning from Demonstration: Impact of User Experience, Task Complexity, and Start Configuration on Performance , 2019, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[46] Demis Hassabis,et al. Grounded Language Learning in a Simulated 3D World , 2017, ArXiv.

[47] Marcin Andrychowicz,et al. Overcoming Exploration in Reinforcement Learning with Demonstrations , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[48] Dan Klein,et al. Modular Multitask Reinforcement Learning with Policy Sketches , 2016, ICML.

[49] Alex Mott,et al. Towards Interpretable Reinforcement Learning Using Attention Augmented Agents , 2019, NeurIPS.