Distilling Script Knowledge from Large Language Models for Constrained Language Planning

In everyday life, humans often plan their actions by following step-by-step instructions in the form of goal-oriented scripts. Previous work has exploited language models (LMs) to plan for abstract goals of stereotypical activities (e.g., “make a cake”), but leaves more specific goals with multi-facet constraints understudied (e.g., “make a cake for diabetics”). In this paper, we define the task of constrained language planning for the first time. We propose an over-generate-then-filter approach to improve large language models (LLMs) on this task, and use it to distill a novel constrained language planning dataset, Coscript, which consists of 55,000 scripts. Empirical results demonstrate that our method significantly improves the constrained language planning ability of LLMs, especially on constraint faithfulness. Furthermore, Coscript is demonstrated to be quite effective in endowing smaller LMs with constrained language planning ability.

[1]  S. Sreedharan,et al.  Large Language Models Still Can't Plan (A Benchmark for LLMs on Planning and Reasoning about Change) , 2022, ArXiv.

[2]  S. Gu,et al.  Large Language Models are Zero-Shot Reasoners , 2022, NeurIPS.

[3]  G. Konidaris,et al.  Using Language to Generate State Abstractions for Long-Range Planning in Outdoor Environments , 2022, 2022 International Conference on Robotics and Automation (ICRA).

[4]  J. Tenenbaum,et al.  Structured, flexible, and robust: benchmarking and improving large language models towards more human-like behavior in out-of-distribution reasoning tasks , 2022, ArXiv.

[5]  Andrew M. Dai,et al.  PaLM: Scaling Language Modeling with Pathways , 2022, J. Mach. Learn. Res..

[6]  Mirella Lapata,et al.  A Well-Composed Text is Half Done! Composition Sampling for Diverse Conditional Generation , 2022, ACL.

[7]  D. Schuurmans,et al.  Self-Consistency Improves Chain of Thought Reasoning in Language Models , 2022, ICLR.

[8]  Hou Pong Chan,et al.  PLANET: Dynamic Content Planning in Autoregressive Transformers for Long-form Text Generation , 2022, ACL.

[9]  Ryan J. Lowe,et al.  Training language models to follow instructions with human feedback , 2022, NeurIPS.

[10]  M. Lewis,et al.  Rethinking the Role of Demonstrations: What Makes In-Context Learning Work? , 2022, Conference on Empirical Methods in Natural Language Processing.

[11]  Dale Schuurmans,et al.  Chain of Thought Prompting Elicits Reasoning in Large Language Models , 2022, NeurIPS.

[12]  P. Abbeel,et al.  Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents , 2022, ICML.

[13]  Noah A. Smith,et al.  WANLI: Worker and AI Collaboration for Natural Language Inference Dataset Creation , 2022, EMNLP.

[14]  Rachel Rudinger,et al.  What do Large Language Models Learn about Scripts? , 2021, STARSEM.

[15]  Mark O. Riedl,et al.  Reframing Human-AI Collaboration for Generating Free-Text Explanations , 2021, NAACL.

[16]  Chris Callison-Burch,et al.  Induce, Edit, Retrieve: Language Grounded Multimodal Schema for Instructional Video Retrieval , 2021, ArXiv.

[17]  Alexander M. Rush,et al.  Multitask Prompted Training Enables Zero-Shot Task Generalization , 2021, ICLR.

[18]  Ronan Le Bras,et al.  Symbolic Knowledge Distillation: from General Language Models to Commonsense Models , 2021, NAACL.

[19]  Yejin Choi,et al.  Reframing Instructional Prompts to GPTk’s Language , 2021, FINDINGS.

[20]  Jackie Chi Kit Cheung,et al.  Hallucinated but Factual! Inspecting the Factuality of Hallucinations in Abstractive Summarization , 2021, ACL.

[21]  Moninder Singh,et al.  Your fairness may vary: Pretrained language model fairness in toxic text classification , 2021, FINDINGS.

[22]  Chris Callison-Burch,et al.  Goal-Oriented Script Construction , 2021, INLG.

[23]  Wojciech Zaremba,et al.  Evaluating Large Language Models Trained on Code , 2021, ArXiv.

[24]  Subbarao Kambhampati,et al.  GPT3-to-plan: Extracting plans from text using GPT-3 , 2021, ArXiv.

[25]  Ronen I. Brafman,et al.  Verifying Plans and Scripts for Robotics Tasks Using Performance Level Profiles , 2021, ICAPS.

[26]  W. Dolan,et al.  A Token-level Reference-free Hallucination Detection Benchmark for Free-form Text Generation , 2021, ACL.

[27]  S. Riedel,et al.  Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity , 2021, ACL.

[28]  Danqi Chen,et al.  SimCSE: Simple Contrastive Learning of Sentence Embeddings , 2021, EMNLP.

[29]  Wai Lam,et al.  A Theoretical Analysis of the Repetition Problem in Text Generation , 2020, AAAI.

[30]  Bhavana Dalvi,et al.  A Dataset for Tracking Entities in Open Domain Procedural Text , 2020, EMNLP.

[31]  Nanyun Peng,et al.  Content Planning for Neural Story Generation with Aristotelian Rescoring , 2020, EMNLP.

[32]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[33]  Hannaneh Hajishirzi,et al.  UnifiedQA: Crossing Format Boundaries With a Single QA System , 2020, FINDINGS.

[34]  Colin Raffel,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[35]  Kilian Q. Weinberger,et al.  BERTScore: Evaluating Text Generation with BERT , 2019, ICLR.

[36]  Dieter Fox,et al.  Prospection: Interpretable plans from language by predicting the future , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[37]  William Yang Wang,et al.  WikiHow: A Large Scale Text Summarization Dataset , 2018, ArXiv.

[38]  Lilian D. A. Wanzare,et al.  A Crowdsourced Database of Event Sequence Descriptions for the Acquisition of High-quality Script Knowledge , 2016, LREC.

[39]  Mark Steedman,et al.  Extracting common sense knowledge from text for robot planning , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[40]  Haixun Wang,et al.  Probase: a probabilistic taxonomy for text understanding , 2012, SIGMOD Conference.

[41]  Alexander Koller,et al.  Automated Planning for Situated Natural Language Generation , 2010, ACL.

[42]  Manfred Pinkal,et al.  Learning Script Knowledge with Web Experiments , 2010, ACL.

[43]  M. Corballis,et al.  The evolution of foresight: What is mental time travel, and is it unique to humans? , 2007, The Behavioral and brain sciences.

[44]  Matthew Stone,et al.  Sentence generation as a planning problem , 2007, ACL.

[45]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[46]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[47]  R. B. Baldauf,et al.  Language Planning: From Practice to Theory , 1997 .

[48]  Roger C. Schank,et al.  Scripts, plans, goals and understanding: an inquiry into human knowledge structures , 1978 .

[49]  R. Schank,et al.  Scripts, Plans and Knowledge , 1975, IJCAI.

[50]  Karin M. Verspoor,et al.  What does it take to bake a cake? The RecipeRef corpus and anaphora resolution in procedural text , 2022, FINDINGS.

[51]  William Yang Wang,et al.  Neuro-Symbolic Causal Language Planning with Commonsense Prompting , 2022, ArXiv.

[52]  Yejin Choi,et al.  proScript: Partially Ordered Scripts Generation via Pre-trained Language Models , 2021, EMNLP.

[53]  Dit-Yan Yeung,et al.  Probing Toxic Content in Large Pre-Trained Language Models , 2021, ACL.

[54]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[55]  Endel Tulving,et al.  The evolution of foresight: What is mental time travel, and is it unique to humans , 2007 .

[56]  retool an Espp Find the right fit. , 2003, Dentistry today.

[57]  R. Abelson Script processing in attitude formation and decision making. , 1976 .