Translating Natural Language to Planning Goals with Large-Language Models

Recent large language models (LLMs) have demonstrated remarkable performance on a variety of natural language processing (NLP) tasks, leading to intense excitement about their applicability across various domains. Unfortunately, recent work has also shown that LLMs are unable to perform accurate reasoning nor solve planning problems, which may limit their usefulness for robotics-related tasks. In this work, our central question is whether LLMs are able to translate goals specified in natural language to a structured planning language. If so, LLM can act as a natural interface between the planner and human users; the translated goal can be handed to domain-independent AI planners that are very effective at planning. Our empirical results on GPT 3.5 variants show that LLMs are much better suited towards translation rather than planning. We find that LLMs are able to leverage commonsense knowledge and reasoning to furnish missing details from under-specified goals (as is often the case in natural language). However, our experiments also reveal that LLMs can fail to generate goals in tasks that involve numerical or physical (e.g., spatial) reasoning, and that LLMs are sensitive to the prompts used. As such, these models are promising for translation to structured planning languages, but care should be taken in their use.

[1]  Anna A. Ivanova,et al.  Dissociating language and thought in large language models: a cognitive perspective , 2023, ArXiv.

[2]  K. Chang,et al.  Towards Reasoning in Large Language Models: A Survey , 2022, ArXiv.

[3]  L. Horesh,et al.  Plansformer: Generating Symbolic Plans using Transformers , 2022, arXiv.org.

[4]  Stefanie Tellex,et al.  Planning with Large Language Models via Corrective Re-prompting , 2022, ArXiv.

[5]  M. Ryoo,et al.  Open-vocabulary Queryable Scene Representations for Real World Planning , 2022, 2023 IEEE International Conference on Robotics and Automation (ICRA).

[6]  Peter R. Florence,et al.  Inner Monologue: Embodied Reasoning through Planning with Language Models , 2022, CoRL.

[7]  S. Sreedharan,et al.  Large Language Models Still Can't Plan (A Benchmark for LLMs on Planning and Reasoning about Change) , 2022, ArXiv.

[8]  J. Tenenbaum,et al.  Structured, flexible, and robust: benchmarking and improving large language models towards more human-like behavior in out-of-distribution reasoning tasks , 2022, ArXiv.

[9]  Andrew M. Dai,et al.  PaLM: Scaling Language Modeling with Pathways , 2022, J. Mach. Learn. Res..

[10]  S. Levine,et al.  Do As I Can, Not As I Say: Grounding Language in Robotic Affordances , 2022, CoRL.

[11]  Harold Soh,et al.  MIRROR: Differentiable Deep Social Projection for Assistive Human-Robot Communication , 2022, Robotics: Science and Systems.

[12]  A. Torralba,et al.  Pre-Trained Language Models for Interactive Decision-Making , 2022, NeurIPS.

[13]  Renelito Delos Santos,et al.  LaMDA: Language Models for Dialog Applications , 2022, ArXiv.

[14]  P. Abbeel,et al.  Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents , 2022, ICML.

[15]  Stefanie Tellex,et al.  Lang2LTL: Translating Natural Language Commands to Temporal Specification with Large Language Models , 2022 .

[16]  Wojciech Zaremba,et al.  Evaluating Large Language Models Trained on Code , 2021, ArXiv.

[17]  Subbarao Kambhampati,et al.  GPT3-to-plan: Extracting plans from text using GPT-3 , 2021, ArXiv.

[18]  Christian Muise,et al.  A Natural Language Model for Generating PDDL , 2021 .

[19]  Stefanie Tellex,et al.  Grounding Language to Non-Markovian Tasks with No Supervision of Task Specifications , 2020, Robotics: Science and Systems.

[20]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[21]  Luke Zettlemoyer,et al.  ALFRED: A Benchmark for Interpreting Grounded Instructions for Everyday Tasks , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Colin Raffel,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[23]  Felipe Meneguzzi,et al.  Planning Domain Generation from Natural Language Step-by-Step Instructions , 2020 .

[24]  N. Yorke-Smith,et al.  NLtoPDDL: One-Shot Learning of PDDL Models from Natural Language Process Manuals , 2020 .

[25]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[26]  Stefanie Tellex,et al.  Sequence-to-Sequence Language Grounding of Non-Markovian Task Specifications , 2018, Robotics: Science and Systems.

[27]  Subbarao Kambhampati,et al.  Extracting Action Sequences from Texts Based on Deep Reinforcement Learning , 2018, IJCAI.

[28]  Blai Bonet,et al.  A Concise Introduction to Models and Methods for Automated Planning , 2013, A Concise Introduction to Models and Methods for Automated Planning.

[29]  Malte Helmert,et al.  The Fast Downward Planning System , 2006, J. Artif. Intell. Res..

[30]  Maria Fox,et al.  PDDL2.1: An Extension to PDDL for Expressing Temporal Planning Domains , 2003, J. Artif. Intell. Res..

[31]  John K. Slaney,et al.  Blocks World revisited , 2001, Artif. Intell..