Substance over Style: Document-Level Targeted Content Transfer

Existing language models excel at writing from scratch, but many real-world scenarios require rewriting an existing document to fit a set of constraints. Although sentence-level rewriting has been fairly well-studied, little work has addressed the challenge of rewriting an entire document coherently. In this work, we introduce the task of document-level targeted content transfer and address it in the recipe domain, with a recipe as the document and a dietary restriction (such as vegan or dairy-free) as the targeted constraint. We propose a novel model for this task based on the generative pre-trained language model (GPT-2) and train on a large number of roughly-aligned recipe pairs (this https URL). Both automatic and human evaluations show that our model out-performs existing methods by generating coherent and diverse rewrites that obey the constraint while remaining close to the original document. Finally, we analyze our model's rewrites to assess progress toward the goal of making language generation more attuned to constraints that are substantive rather than stylistic.

[1]  Eduard Hovy,et al.  Manual and automatic evaluation of summaries , 2002, ACL 2002.

[2]  Zhenyu Zhang,et al.  HIN: Hierarchical Inference Network for Document-Level Relation Extraction , 2020, PAKDD.

[3]  Shuyang Li,et al.  Generating Personalized Recipes from Historical User Preferences , 2019, EMNLP.

[4]  Asli Celikyilmaz,et al.  A Recipe for Creating Multimodal Aligned Datasets for Sequential Tasks , 2020, ACL.

[5]  Omer Levy,et al.  Simulating Action Dynamics with Neural Process Networks , 2017, ICLR.

[6]  Gholamreza Haffari,et al.  A Survey on Document-level Machine Translation: Methods and Evaluation , 2019, ArXiv.

[7]  Jianfeng Gao,et al.  A Diversity-Promoting Objective Function for Neural Conversation Models , 2015, NAACL.

[8]  Ali Farhadi,et al.  Defending Against Neural Fake News , 2019, NeurIPS.

[9]  Eric P. Xing,et al.  Controllable Text Generation , 2017, ArXiv.

[10]  Kai-Wei Chang,et al.  Building Language Models for Text with Named Entities , 2018, ACL.

[11]  Lav R. Varshney,et al.  CTRL: A Conditional Transformer Language Model for Controllable Generation , 2019, ArXiv.

[12]  Dongyan Zhao,et al.  Plan-And-Write: Towards Better Automatic Storytelling , 2018, AAAI.

[13]  Balaji Vasan Srinivasan,et al.  Adapting Language Models for Non-Parallel Author-Stylized Rewriting , 2019, AAAI.

[14]  Yejin Choi,et al.  Globally Coherent Text Generation with Neural Checklist Models , 2016, EMNLP.

[15]  Christophe Servan,et al.  Using Whole Document Context in Neural Machine Translation , 2019, IWSLT.

[16]  James Henderson,et al.  Document-Level Neural Machine Translation with Hierarchical Attention Networks , 2018, EMNLP.

[17]  Omer Levy,et al.  BERT for Coreference Resolution: Baselines and Analysis , 2019, EMNLP/IJCNLP.

[18]  Po-Sen Huang,et al.  Discourse-Aware Neural Rewards for Coherent Text Generation , 2018, NAACL.

[19]  Percy Liang,et al.  Delete, Retrieve, Generate: a Simple Approach to Sentiment and Style Transfer , 2018, NAACL.

[20]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[21]  Jason Yosinski,et al.  Plug and Play Language Models: A Simple Approach to Controlled Text Generation , 2020, ICLR.

[22]  Karen B DeSalvo,et al.  Dietary guidelines for Americans. , 2016, The American journal of clinical nutrition.

[23]  Lada A. Adamic,et al.  Recipe recommendation using ingredient networks , 2011, WebSci '12.

[24]  Harsh Jhamtani,et al.  Shakespearizing Modern Language Using Copy-Enriched Sequence to Sequence Models , 2017, Proceedings of the Workshop on Stylistic Variation.

[25]  Ryosuke Yamanishi,et al.  Alternative-ingredient Recommendation Based on Co-occurrence Relation on Recipe Database , 2015, KES.

[26]  Eric P. Xing,et al.  Toward Controlled Generation of Text , 2017, ICML.

[27]  Joel R. Tetreault,et al.  Dear Sir or Madam, May I Introduce the GYAFC Dataset: Corpus, Benchmarks and Metrics for Formality Style Transfer , 2018, NAACL.

[28]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[29]  Nicole J. J. P. Koenderink,et al.  Automatic extraction of ingredient's substitutes , 2014, UbiComp Adjunct.