Controllable Text Simplification with Explicit Paraphrasing

Text Simplification improves the readability of sentences through several rewriting transformations, such as lexical paraphrasing, deletion, and splitting. Current simplification systems are predominantly sequence-to-sequence models that are trained end-to-end to perform all these operations simultaneously. However, such systems limit themselves to mostly deleting words and cannot easily adapt to the requirements of different target audiences. In this paper, we propose a novel hybrid approach that leverages linguistically-motivated rules for splitting and deletion, and couples them with a neural paraphrasing model to produce varied rewriting styles. We introduce a new data augmentation method to improve the paraphrasing capability of our model. Through automatic and manual evaluations, we show that our proposed model establishes a new state-of-the-art for the task, paraphrasing more often than the existing systems, and can control the degree of each simplification operation applied to the input texts.

[1]  R. P. Fishburne,et al.  Derivation of New Readability Formulas (Automated Readability Index, Fog Count and Flesch Reading Ease Formula) for Navy Enlisted Personnel , 1975 .

[2]  Raman Chandrasekar,et al.  Motivations and Methods for Text Simplification , 1996, COLING.

[3]  Siobhan Devlin,et al.  Simplifying Text for Language-Impaired Readers , 1999, EACL.

[4]  Advaith Siddharthan,et al.  An architecture for a text simplification system , 2002, Language Engineering Conference, 2002. Proceedings.

[5]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[6]  Advaith Siddharthan,et al.  Syntactic Simplification and Text Cohesion , 2006 .

[7]  Ani Nenkova,et al.  Syntactic Simplification for Improving Content Selection in Multi-Document Summarization , 2004, COLING.

[8]  Mari Ostendorf,et al.  Text simplification for language learners: a corpus analysis , 2007, SLaTE.

[9]  Daphne Koller,et al.  Sentence Simplification for Semantic Role Labeling , 2008, ACL.

[10]  Iryna Gurevych,et al.  A Monolingual Tree-based Translation Model for Sentence Simplification , 2010, COLING.

[11]  Jun'ichi Tsujii,et al.  Entity-Focused Sentence Simplification for Relation Extraction , 2010, COLING.

[12]  David Kauchak,et al.  Simple English Wikipedia: A New Text Simplification Task , 2011, ACL.

[13]  Mirella Lapata,et al.  Learning to Simplify Sentences with Quasi-Synchronous Grammar and Integer Programming , 2011, EMNLP.

[14]  Lucia Specia,et al.  SemEval-2012 Task 1: English Lexical Simplification , 2012, *SEMEVAL.

[15]  Hsin-Hsi Chen,et al.  A Simplification-Translation-Restoration Framework for Cross-Domain SMT Applications , 2012, COLING.

[16]  Hiroshi Matsumoto,et al.  Selecting Proper Lexical Paraphrase for Children , 2013, ROCLING/IJCLCLP.

[17]  Ricardo Baeza-Yates,et al.  The Impact of Lexical Simplification by Verbal Paraphrases for People with and without Dyslexia , 2013, CICLing.

[18]  Chris Callison-Burch,et al.  A Lightweight and High Performance Monolingual Word Aligner , 2013, ACL.

[19]  Shashi Narayan,et al.  Hybrid Simplification using Deep Semantics and Machine Translation , 2014, ACL.

[20]  Advaith Siddharthan,et al.  Hybrid text simplification using synchronous dependency grammars with hand-written and automatically harvested rules , 2014, EACL.

[21]  Maxine Eskénazi,et al.  An Open Corpus of Everyday Documents for Simplification Tasks , 2014, PITR@EACL.

[22]  David Kauchak,et al.  Learning a Lexical Simplifier Using Wikipedia , 2014, ACL.

[23]  Matthew Crosby,et al.  Association for the Advancement of Artificial Intelligence , 2014 .

[24]  Goran Glavas,et al.  Simplifying Lexical Simplification: Do We Need Simplified Corpora? , 2015, ACL.

[25]  Chris Callison-Burch,et al.  Problems in Current Text Simplification Research: New Data Can Help , 2015, TACL.

[26]  Jason Weston,et al.  A Neural Attention Model for Abstractive Sentence Summarization , 2015, EMNLP.

[27]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[28]  Lukasz Kaiser,et al.  Sentence Compression by Deletion with LSTMs , 2015, EMNLP.

[29]  Sanja Stajner,et al.  A Deeper Exploration of the Standard PB-SMT Approach to Text Simplification and its Evaluation , 2015, ACL.

[30]  Lucia Specia,et al.  LEXenstein: A Framework for Lexical Simplification , 2015, ACL.

[31]  Gustavo Henrique Paetzold Lexical simplification for non-native English speakers , 2016 .

[32]  Neural Networks Models for Entity Discovery and Linking , 2016, ArXiv.

[33]  Sanja Stajner,et al.  Can Text Simplification Help Machine Translation? , 2016, EAMT.

[34]  Chris Callison-Burch,et al.  Optimizing Statistical Machine Translation for Text Simplification , 2016, TACL.

[35]  Nando de Freitas,et al.  Neural Programmer-Interpreters , 2015, ICLR.

[36]  Hang Li,et al.  “ Tony ” DNN Embedding for “ Tony ” Selective Read for “ Tony ” ( a ) Attention-based Encoder-Decoder ( RNNSearch ) ( c ) State Update s 4 SourceVocabulary Softmax Prob , 2016 .

[37]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[38]  Bowen Zhou,et al.  Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond , 2016, CoNLL.

[39]  Mirella Lapata,et al.  Sentence Simplification with Deep Reinforcement Learning , 2017, EMNLP.

[40]  Christopher D. Manning,et al.  Get To The Point: Summarization with Pointer-Generator Networks , 2017, ACL.

[41]  Joachim Bingel,et al.  Learning How to Simplify From Explicit Labeling of Complex-Simplified Text Pairs , 2017, IJCNLP.

[42]  Sergiu Nisioi,et al.  Exploring Neural Text Simplification Models , 2017, ACL.

[43]  Lucia Specia,et al.  Lexical Simplification with Neural Ranking , 2017, EACL.

[44]  Wei Xu,et al.  A Word-Complexity Lexicon and A Neural Readability Ranking Model for Lexical Simplification , 2018, EMNLP.

[45]  Lucia Specia,et al.  Learning Simplifications for Specific Target Audiences , 2018, ACL.

[46]  Gourab Kundu,et al.  Neural Cross-Lingual Entity Linking , 2017, AAAI.

[47]  Ari Rappoport,et al.  BLEU is Not Suitable for the Evaluation of Text Simplification , 2018, EMNLP.

[48]  Ari Rappoport,et al.  Simple and Effective Text Simplification Using Semantic and Neural Methods , 2018, ACL.

[49]  John Lee,et al.  Personalizing Lexical Simplification , 2018, COLING.

[50]  Xiaojun Wan,et al.  Automatic Text Simplification , 2018, Computational Linguistics.

[51]  Ari Rappoport,et al.  Semantic Structural Evaluation for Text Simplification , 2018, NAACL.

[52]  Bambang Parmanto,et al.  Integrating Transformer and Paraphrase Rules for Sentence Simplification , 2018, EMNLP.

[53]  Manaal Faruqui,et al.  Learning To Split and Rephrase From Wikipedia Edit History , 2018, EMNLP.

[54]  Yoav Goldberg,et al.  Split and Rephrase: Better Evaluation and a Stronger Baseline , 2018, ACL.

[55]  Jackie Chi Kit Cheung,et al.  EditNTS: An Neural Programmer-Interpreter Model for Sentence Simplification through Explicit Editing , 2019, ACL.

[56]  Chris Callison-Burch,et al.  Complexity-Weighted Loss and Diverse Reranking for Sentence Simplification , 2019, NAACL.

[57]  Mirella Lapata,et al.  Controllable Sentence Simplification: Employing Syntactic and Lexical Constraints , 2019, ArXiv.

[58]  Ioannis Konstas,et al.  SEQˆ3: Differentiable Sequence-to-Sequence-to-Sequence Autoencoder for Unsupervised Abstractive Sentence Compression , 2019, NAACL.

[59]  Tomoyuki Kajiwara,et al.  Controllable Text Simplification with Lexical Constraint Loss , 2019, ACL.

[60]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[61]  Siegfried Handschuh,et al.  Transforming Complex Sentences into a Semantic Hierarchy , 2019, ACL.

[62]  Kilian Q. Weinberger,et al.  BERTScore: Evaluating Text Generation with BERT , 2019, ICLR.

[63]  Li Zhang,et al.  Small but Mighty: New Benchmarks for Split and Rephrase , 2020, EMNLP.

[64]  Junyi Jessy Li,et al.  Discourse Level Factors for Sentence Deletion in Text Simplification , 2019, AAAI.

[65]  Lili Mou,et al.  Iterative Edit-Based Unsupervised Sentence Simplification , 2020, ACL.

[66]  Yi Zhu,et al.  Lexical Simplification with Pretrained Encoders , 2019, AAAI.

[67]  Leveraging Pre-trained Checkpoints for Sequence Generation Tasks , 2019, Transactions of the Association for Computational Linguistics.

[68]  Wei Xu,et al.  Neural CRF Sentence Alignment Model for Text Simplification , 2020 .

[69]  Antoine Bordes,et al.  Controllable Sentence Simplification , 2019, LREC.

[70]  Lucia Specia,et al.  ASSET: A Dataset for Tuning and Evaluation of Sentence Simplification Models with Multiple Rewriting Transformations , 2020, ACL.