Help me write a Poem - Instruction Tuning as a Vehicle for Collaborative Poetry Writing

Recent work in training large language models (LLMs) to follow natural language instructions has opened up exciting opportunities for natural language interface design. Building on the prior success of large language models in the realm of computer assisted creativity, in this work, we present CoPoet, a collaborative poetry writing system, with the goal of to study if LLM’s actually improve the quality of the generated content. In contrast to auto-completing a user’s text, CoPoet is controlled by user instructions that specify the attributes of the desired text, such as Write a sentence about ‘love’ or Write a sentence ending in ‘fly’. The core component of our system is a language model fine-tuned on a diverse collection of instructions for poetry writing. Our model is not only competitive to publicly available LLMs trained on instructions (InstructGPT), but also capable of satisfying unseen compositional instructions. A study with 15 qualified crowdworkers shows that users successfully write poems with CoPoet on diverse topics ranging from Monarchy to Climate change, which are preferred by third-party evaluators over poems written without the system.

[1]  K. Mathewson,et al.  Co-Writing Screenplays and Theatre Scripts with Language Models: Evaluation by Industry Professionals , 2022, CHI.

[2]  Andrew M. Dai,et al.  Scaling Instruction-Finetuned Language Models , 2022, ArXiv.

[3]  D. Klein,et al.  Re3: Generating Longer Stories With Recursive Reprompting and Revision , 2022, EMNLP.

[4]  E. Nouri,et al.  HELP ME THINK: A Simple Prompting Strategy for Non-experts to Create Customized Content with Models , 2022, ArXiv.

[5]  J. Dean,et al.  Emergent Abilities of Large Language Models , 2022, Trans. Mach. Learn. Res..

[6]  Eneko Agirre,et al.  PoeLM: A Meter- and Rhyme-Controllable Language Model for Unsupervised Poetry Generation , 2022, EMNLP.

[7]  Nanyun Peng,et al.  Zero-shot Sonnet Generation with Discourse-level Planning and Aesthetics Features , 2022, NAACL.

[8]  G. Karypis,et al.  Exploring the Role of Task Transferability in Large-Scale Multi-Task Learning , 2022, NAACL.

[9]  Prafulla Dhariwal,et al.  Hierarchical Text-Conditional Image Generation with CLIP Latents , 2022, ArXiv.

[10]  Zae Myung Kim,et al.  Read, Revise, Repeat: A System Demonstration for Human-in-the-loop Iterative Text Revision , 2022, IN2WRITING.

[11]  Ryan J. Lowe,et al.  Training language models to follow instructions with human feedback , 2022, NeurIPS.

[12]  Percy Liang,et al.  CoAuthor: Designing a Human-AI Collaborative Writing Dataset for Exploring Language Model Capabilities , 2022, CHI.

[13]  Sanket Vaibhav Mehta,et al.  ExT5: Towards Extreme Multi-Task Scaling for Transfer Learning , 2021, ArXiv.

[14]  He He,et al.  Machine-in-the-Loop Rewriting for Creative Image Captioning , 2021, NAACL.

[15]  Alexander M. Rush,et al.  Multitask Prompted Training Enables Zero-Shot Task Generalization , 2021, ICLR.

[16]  Chris Callison-Burch,et al.  A Recipe for Arbitrary Text Style Transfer with Large Language Models , 2021, ACL.

[17]  Quoc V. Le,et al.  Finetuned Language Models Are Zero-Shot Learners , 2021, ICLR.

[18]  Hannaneh Hajishirzi,et al.  Cross-Task Generalization via Natural Language Crowdsourcing Instructions , 2021, ACL.

[19]  David C. Uthus,et al.  Augmenting Poetry Composition with Verse by Verse , 2021, NAACL.

[20]  Noah A. Smith,et al.  Benchmarking Generalization via In-Context Instructions on 1, 600+ Language Tasks , 2022, ArXiv.

[21]  Smaranda Muresan,et al.  Don’t Go Far Off: An Empirical Study on Neural Poetry Translation , 2021, EMNLP.

[22]  Daphne Ippolito,et al.  Wordcraft: a Human-AI Collaborative Editor for Story Writing , 2021, ArXiv.

[23]  Sherol Chen,et al.  Story Centaur: Large Language Model Few Shot Learning as a Creative Writing Tool , 2021, EACL.

[24]  Katherine Elkins,et al.  Can GPT-3 Pass a Writer’s Turing Test? , 2020, Journal of Cultural Analytics.

[25]  Olatunji Ruwase,et al.  DeepSpeed: System Optimizations Enable Training Deep Learning Models with Over 100 Billion Parameters , 2020, KDD.

[26]  Tim Van de Cruys,et al.  Automatic Poetry Generation from Prosaic Text , 2020, ACL.

[27]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[28]  Chris Donahue,et al.  Enabling Language Models to Fill in the Blanks , 2020, ACL.

[29]  Colin Raffel,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[30]  Yejin Choi,et al.  The Curious Case of Neural Text Degeneration , 2019, ICLR.

[31]  Nanyun Peng,et al.  STORIUM: A Dataset and Evaluation Platform for Machine-in-the-Loop Story Generation , 2020, EMNLP.

[32]  Ricardo Campos,et al.  YAKE! Keyword extraction from single documents using multiple local features , 2020, Inf. Sci..

[33]  R'emi Louf,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[34]  Christopher D. Manning,et al.  Do Massively Pretrained Language Models Make Better Storytellers? , 2019, CoNLL.

[35]  Yann Dauphin,et al.  Hierarchical Neural Story Generation , 2018, ACL.

[36]  Arthur M. Jacobs,et al.  The Gutenberg English Poetry Corpus: Exemplary Quantitative Narrative Analyses , 2018, Front. Digit. Humanit..

[37]  Ricardo Campos,et al.  YAKE! Collection-Independent Automatic Keyword Extractor , 2018, ECIR.

[38]  Noah A. Smith,et al.  Creative Writing with a Machine in the Loop: Case Studies on Slogans and Stories , 2018, IUI.

[39]  Yejin Choi,et al.  Generating Topical Poetry , 2016, EMNLP.

[40]  Andrew S. Gordon,et al.  Creative Help: A Story Writing Assistant , 2015, ICIDS.

[41]  Chris Callison-Burch,et al.  Poetry of the Crowd: A Human Computation Algorithm to Convert Prose into Rhyming Verse , 2014, HCOMP.

[42]  Chin-Yew Lin,et al.  Looking for a Few Good Metrics: ROUGE and its Evaluation , 2004 .