Controlling Text Edition by Changing Answers of Specific Questions

In this paper, we introduce the new task of controllable text edition, in which we take as input a long text, a question, and a target answer, and the output is a minimally modified text, so that it fits the target answer. This task is very important in many situations, such as changing some conditions, consequences, or properties in a legal document, or changing some key information of an event in a news text. This is very challenging, as it is hard to obtain a parallel corpus for training, and we need to first find all text positions that should be changed and then decide how to change them. We constructed the new dataset WIKIBIOCTE for this task based on the existing dataset WIKIBIO (originally created for table-to-text generation). We use WIKIBIOCTE for training, and manually labeled a test set for testing. We also propose novel evaluation metrics and a novel method for solving the new task. Experimental results on the test set show that our proposed method is a good fit for this novel NLP task.

[1]  Thomas Lukasiewicz,et al.  Learning from the Best: Rationalizing Prediction by Adversarial Information Calibration , 2020, AAAI.

[2]  Thomas Lukasiewicz,et al.  Multi-type Disentanglement without Adversarial Training , 2020, AAAI.

[3]  Lei Sha,et al.  Gradient-guided Unsupervised Lexically Constrained Text Generation , 2020, EMNLP.

[4]  Jianjun Hu,et al.  A Survey on Machine Reading Comprehension: Tasks, Evaluation Metrics, and Benchmark Datasets , 2020, Applied Sciences.

[5]  Marc Brockschmidt,et al.  Copy that! Editing Sequences by Copying Spans , 2020, AAAI.

[6]  Chris Donahue,et al.  Enabling Language Models to Fill in the Blanks , 2020, ACL.

[7]  Jie Zhou,et al.  Unsupervised Paraphrasing by Simulated Annealing , 2019, ACL.

[8]  Sebastian Nowozin,et al.  Independent Subspace Analysis for Unsupervised Learning of Disentangled Representations , 2019, AISTATS.

[9]  Huda Khayrallah,et al.  Improved Lexically Constrained Decoding for Translation and Monolingual Rewriting , 2019, NAACL.

[10]  Eric P. Xing,et al.  Text Infilling , 2019, ArXiv.

[11]  Yee Whye Teh,et al.  Disentangling Disentanglement in Variational Autoencoders , 2018, ICML.

[12]  Percy Liang,et al.  A Retrieve-and-Edit Framework for Predicting Structured Outputs , 2018, NeurIPS.

[13]  Lei Li,et al.  CGMH: Constrained Sentence Generation by Metropolis-Hastings Sampling , 2018, AAAI.

[14]  Graham Neubig,et al.  Learning to Represent Edits , 2018, ICLR.

[15]  Manaal Faruqui,et al.  WikiAtomicEdits: A Multilingual Corpus of Wikipedia Edits for Modeling Language and Discourse , 2018, EMNLP.

[16]  Anna Rumshisky,et al.  Adversarial Decomposition of Text Representation , 2018, NAACL.

[17]  Lili Mou,et al.  Disentangled Representation Learning for Non-Parallel Text Style Transfer , 2018, ACL.

[18]  Gaurav Pandey,et al.  Exemplar Encoder-Decoder for Neural Conversation Generation , 2018, ACL.

[19]  Furu Wei,et al.  Retrieve, Rerank and Rewrite: Soft Template Based Neural Summarization , 2018, ACL.

[20]  Zhoujun Li,et al.  Response Generation by Context-aware Prototype Editing , 2018, AAAI.

[21]  Percy Liang,et al.  Know What You Don’t Know: Unanswerable Questions for SQuAD , 2018, ACL.

[22]  Rob Brekelmans,et al.  Invariant Representations without Adversarial Training , 2018, NeurIPS.

[23]  Gonzalo Iglesias,et al.  Neural Machine Translation Decoding with Terminology Constraints , 2018, NAACL.

[24]  Matt Post,et al.  Fast Lexically Constrained Decoding with Dynamic Beam Allocation for Neural Machine Translation , 2018, NAACL.

[25]  Andriy Mnih,et al.  Disentangling by Factorising , 2018, ICML.

[26]  David Duvenaud,et al.  Isolating Sources of Disentanglement in VAEs , 2018, 1802.04942.

[27]  Andrew M. Dai,et al.  MaskGAN: Better Text Generation via Filling in the ______ , 2018, ICLR.

[28]  Zhifang Sui,et al.  Table-to-text Generation by Structure-aware Seq2seq Learning , 2017, AAAI.

[29]  Percy Liang,et al.  Generating Sentences by Editing Prototypes , 2017, TACL.

[30]  Pascal Poupart,et al.  Order-Planning Neural Text Generation From Structured Data , 2017, AAAI.

[31]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[32]  Guokun Lai,et al.  RACE: Large-scale ReAding Comprehension Dataset From Examinations , 2017, EMNLP.

[33]  Qun Liu,et al.  Lexically Constrained Decoding for Sequence Generation Using Grid Beam Search , 2017, ACL.

[34]  Christopher Burgess,et al.  beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.

[35]  Ben Poole,et al.  Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[36]  Kyunghyun Cho,et al.  Neural Machine Translation , 2016, ACL.

[37]  Jian Zhang,et al.  SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.

[38]  J. Schulman,et al.  InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets , 2016, NIPS.

[39]  David Grangier,et al.  Neural Text Generation from Structured Data with Application to the Biography Domain , 2016, EMNLP.

[40]  Zhengdong Lu,et al.  Incorporating Copying Mechanism in Sequence-to-Sequence Learning , 2016, ACL.

[41]  Phil Blunsom,et al.  Teaching Machines to Read and Comprehend , 2015, NIPS.

[42]  Matthew Richardson,et al.  MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text , 2013, EMNLP.

[43]  Hong Sun,et al.  Joint Learning of a Dual SMT System for Paraphrase Generation , 2012, ACL.

[44]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[45]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[46]  Hermann Ney,et al.  Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[47]  Klaus Krippendorff,et al.  Content Analysis: An Introduction to Its Methodology , 1980 .