论文信息 - Language to Rewards for Robotic Skill Synthesis

Language to Rewards for Robotic Skill Synthesis

Large language models (LLMs) have demonstrated exciting progress in acquiring diverse new capabilities through in-context learning, ranging from logical reasoning to code-writing. Robotics researchers have also explored using LLMs to advance the capabilities of robotic control. However, since low-level robot actions are hardware-dependent and underrepresented in LLM training corpora, existing efforts in applying LLMs to robotics have largely treated LLMs as semantic planners or relied on human-engineered control primitives to interface with the robot. On the other hand, reward functions are shown to be flexible representations that can be optimized for control policies to achieve diverse tasks, while their semantic richness makes them suitable to be specified by LLMs. In this work, we introduce a new paradigm that harnesses this realization by utilizing LLMs to define reward parameters that can be optimized and accomplish variety of robotic tasks. Using reward as the intermediate interface generated by LLMs, we can effectively bridge the gap between high-level language instructions or corrections to low-level robot actions. Meanwhile, combining this with a real-time optimizer, MuJoCo MPC, empowers an interactive behavior creation experience where users can immediately observe the results and provide feedback to the system. To systematically evaluate the performance of our proposed method, we designed a total of 17 tasks for a simulated quadruped robot and a dexterous manipulator robot. We demonstrate that our proposed method reliably tackles 90% of the designed tasks, while a baseline using primitive skills as the interface with Code-as-policies achieves 50% of the tasks. We further validated our method on a real robot arm where complex manipulation skills such as non-prehensile pushing emerge through our interactive system.

[1] Vincent Vanhoucke,et al. Barkour: Benchmarking Animal-level Agility with Quadruped Robots , 2023, ArXiv.

[2] W. Yu,et al. Large Language Models are Built-in Autoregressive Search Engines , 2023, ACL.

[3] S. Levine,et al. Learning and Adapting Agile Locomotion Skills by Transferring Experience , 2023, Robotics: Science and Systems.

[4] Dorsa Sadigh,et al. Language Instructed Reinforcement Learning for Human-AI Coordination , 2023, ICML.

[5] Sang Michael Xie,et al. Reward Design with Language Models , 2023, ICLR.

[6] Ashish Kapoor,et al. ChatGPT for Robotics: Design Principles and Model Abilities , 2023, IEEE Access.

[7] Percy Liang,et al. No, to the Right: Online Language Corrections for Robotic Manipulation via Shared Autonomy , 2023, HRI.

[8] S. Levine,et al. RT-1: Robotics Transformer for Real-World Control at Scale , 2022, Robotics: Science and Systems.

[9] Taylor A. Howell,et al. Predictive Sampling: Real-time Behaviour Synthesis with MuJoCo , 2022, ArXiv.

[10] Peter R. Florence,et al. Interactive Language: Talking to Robots in Real Time , 2022, IEEE Robotics and Automation Letters.

[11] Jing Yu Koh,et al. A New Path: Scaling Vision-and-Language Navigation with Synthetic Instructions and Imitation Learning , 2022, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12] Jessica Borja-Diaz,et al. Grounding Language with Visual Affordances over Unstructured Data , 2022, 2023 IEEE International Conference on Robotics and Automation (ICRA).

[13] M. V. D. Panne,et al. OPT-Mimic: Imitation of Optimized Trajectories for Dynamic Quadruped Behaviors , 2022, 2023 IEEE International Conference on Robotics and Automation (ICRA).

[14] A. Piergiovanni,et al. F-VLM: Open-Vocabulary Object Detection upon Frozen Vision and Language Models , 2022, ArXiv.

[15] D. Fox,et al. ProgPrompt: Generating Situated Robot Task Plans using Large Language Models , 2022, 2023 IEEE International Conference on Robotics and Automation (ICRA).

[16] Peter R. Florence,et al. Code as Policies: Language Model Programs for Embodied Control , 2022, 2023 IEEE International Conference on Robotics and Automation (ICRA).

[17] Luis F. C. Figueredo,et al. LATTE: LAnguage Trajectory TransformEr , 2022, 2023 IEEE International Conference on Robotics and Automation (ICRA).

[18] Peter R. Florence,et al. Inner Monologue: Embodied Reasoning through Planning with Language Models , 2022, CoRL.

[19] Anima Anandkumar,et al. MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge , 2022, NeurIPS.

[20] S. Gu,et al. Large Language Models are Zero-Shot Reasoners , 2022, NeurIPS.

[21] Mengjiao Yang,et al. Context-Aware Language Modeling for Goal-Oriented Dialogue Systems , 2022, NAACL-HLT.

[22] D. Fox,et al. Correcting Robot Plans with Natural Language Feedback , 2022, Robotics: Science and Systems.

[23] Andrew M. Dai,et al. PaLM: Scaling Language Modeling with Pathways , 2022, J. Mach. Learn. Res..

[24] A. Dragan,et al. Inferring Rewards from Language in Context , 2022, ACL.

[25] S. Levine,et al. Do As I Can, Not As I Say: Grounding Language in Robotic Affordances , 2022, CoRL.

[26] Adrian S. Wong,et al. Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language , 2022, ICLR.

[27] Luis F. C. Figueredo,et al. Reshaping Robot Trajectories Using Natural Language Commands: A Study of Multi-Modal Data Alignment Using Transformers , 2022, 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[28] Cherepanov,et al. Competition-level code generation with AlphaCode , 2022, Science.

[29] Sergey Levine,et al. BC-Z: Zero-Shot Task Generalization with Robotic Imitation Learning , 2022, CoRL.

[30] Dale Schuurmans,et al. Chain of Thought Prompting Elicits Reasoning in Large Language Models , 2022, NeurIPS.

[31] P. Abbeel,et al. Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents , 2022, ICML.

[32] Dorsa Sadigh,et al. LILA: Language-Informed Latent Actions , 2021, CoRL.

[33] S. Levine,et al. Bridge Data: Boosting Generalization of Robotic Skills with Cross-Domain Datasets , 2021, Robotics: Science and Systems.

[34] S. Savarese,et al. Learning Language-Conditioned Robot Behavior from Offline Data and Crowd-Sourced Annotation , 2021, CoRL.

[35] Joseph J. Lim,et al. Learning to Synthesize Programs as Interpretable and Generalizable Policies , 2021, NeurIPS.

[36] Charles Sutton,et al. Program Synthesis with Large Language Models , 2021, ArXiv.

[37] Wojciech Zaremba,et al. Evaluating Large Language Models Trained on Code , 2021, ArXiv.

[38] Alan Fern,et al. Sim-to-Real Learning of All Common Bipedal Gaits via Periodic Reward Composition , 2020, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[39] Jason Baldridge,et al. Room-Across-Room: Multilingual Vision-and-Language Navigation with Dense Spatiotemporal Grounding , 2020, EMNLP.

[40] Silvio Savarese,et al. ReLMoGen: Leveraging Motion Generation in Reinforcement Learning for Mobile Manipulation , 2020, ArXiv.

[41] Joshua B. Tenenbaum,et al. Learning abstract structure for drawing by efficient motor program induction , 2020, NeurIPS.

[42] Armando Solar-Lezama,et al. DreamCoder: growing generalizable, interpretable knowledge with wake–sleep Bayesian program learning , 2020, Philosophical Transactions of the Royal Society A.

[43] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.

[44] Prasoon Goyal,et al. Using Natural Language for Reward Shaping in Reinforcement Learning , 2019, IJCAI.

[45] Sergey Levine,et al. From Language to Goals: Inverse Reinforcement Learning for Vision-Based Instruction Following , 2019, ICLR.

[46] Joonho Lee,et al. Robust Recovery Controller for a Quadrupedal Robot using Deep Reinforcement Learning , 2019, ArXiv.

[47] Aleksandra Faust,et al. Learning Navigation Behaviors End-to-End With AutoRL , 2018, IEEE Robotics and Automation Letters.

[48] Pushmeet Kohli,et al. Learning to Understand Goal Specifications by Modelling Reward , 2018, ICLR.

[49] Brenna Argall,et al. Real-time natural language corrections for assistive robotic manipulators , 2017, Int. J. Robotics Res..

[50] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[51] Edwin Olson,et al. AprilTag: A robust and flexible visual fiducial system , 2011, 2011 IEEE International Conference on Robotics and Automation.

[52] Hadas Kress-Gazit,et al. Translating Structured English to Robot Controllers , 2008, Adv. Robotics.

[53] Leslie Pack Kaelbling,et al. A large-scale benchmark for few-shot program induction and synthesis , 2021, ICML.

[54] Luke S. Zettlemoyer,et al. Learning to Parse Natural Language Commands to a Robot Control System , 2012, ISER.