Data-Driven Sentence Simplification: Survey and Benchmark

Sentence Simplification (SS) aims to modify a sentence in order to make it easier to read and understand. In order to do so, several rewriting transformations can be performed such as replacement, reordering, and splitting. Executing these transformations while keeping sentences grammatical, preserving their main idea, and generating simpler output, is a challenging and still far from solved problem. In this article, we survey research on SS, focusing on approaches that attempt to learn how to simplify using corpora of aligned original-simplified sentence pairs in English, which is the dominant paradigm nowadays. We also include a benchmark of different approaches on common datasets so as to compare them and highlight their strengths and limitations. We expect that this survey will serve as a starting point for researchers interested in the task and help spark new ideas for future developments.

[1]  Cynthia M. Shewan Auditory Comprehension Problems in Adult Aphasic Individuals , 2006 .

[2]  Shashi Narayan,et al.  Creating Training Corpora for NLG Micro-Planners , 2017, ACL.

[3]  Johan Bos,et al.  Linguistically Motivated Large-Scale NLP with C&C and Boxer , 2007, ACL.

[4]  Richard J. Evans,et al.  Comparing methods for the syntactic simplification of sentences in information extraction , 2011, Literary and Linguistic Computing.

[5]  Renata Pontin de Mattos Fortes,et al.  Facilita: reading assistance for low-literacy readers , 2009, SIGDOC '09.

[6]  Martin Volk,et al.  Building a German/Simple German Parallel Corpus for Automatic Text Simplification , 2013, PITR@ACL.

[7]  Kevin Knight,et al.  A Syntax-based Statistical Translation Model , 2001, ACL.

[8]  Heiner Stuckenschmidt,et al.  Sentence Alignment Methods for Improving Text Simplification Systems , 2017, ACL.

[9]  Joachim Bingel,et al.  Learning How to Simplify From Explicit Labeling of Complex-Simplified Text Pairs , 2017, IJCNLP.

[10]  R. Flesch A new readability yardstick. , 1948, The Journal of applied psychology.

[11]  Emiel Krahmer,et al.  Sentence Simplification by Monolingual Machine Translation , 2012, ACL.

[12]  Lucia Specia,et al.  Learning Simplifications for Specific Target Audiences , 2018, ACL.

[13]  Lucia Specia,et al.  Lexical Simplification with Neural Ranking , 2017, EACL.

[14]  Lucia Specia,et al.  An Analysis of Crowdsourced Text Simplifications , 2014, PITR@EACL.

[15]  Jason Eisner,et al.  Learning Non-Isomorphic Tree Mappings for Machine Translation , 2003, ACL.

[16]  Renata Pontin de Mattos Fortes,et al.  A corpus analysis of simple account texts and the proposal of simplification strategies: first steps towards text simplification systems , 2008, SIGDOC '08.

[17]  R. M. Jindal,et al.  Text Simplification for Language Learners : A Corpus Analysis , 2018 .

[18]  Mirella Lapata,et al.  Learning to Simplify Sentences with Quasi-Synchronous Grammar and Integer Programming , 2011, EMNLP.

[19]  Walt Detmar Meurers,et al.  Readability-based Sentence Ranking for Evaluating Text Simplification , 2016, ArXiv.

[20]  Sanja tajner,et al.  Leveraging event-based semantics for automated text simplification , 2017 .

[21]  Regina Barzilay,et al.  Sentence Alignment for Monolingual Comparable Corpora , 2003, EMNLP.

[22]  Chris Callison-Burch,et al.  PPDB: The Paraphrase Database , 2013, NAACL.

[23]  António Branco,et al.  Enhancing Multi-document Summaries with Sentence Simplification , 2012 .

[24]  David Kauchak,et al.  Sentence Simplification as Tree Transduction , 2013, PITR@ACL.

[25]  Joachim Bingel,et al.  Text Simplification as Tree Labeling , 2016, ACL.

[26]  Mirella Lapata,et al.  Sentence Simplification with Deep Reinforcement Learning , 2017, EMNLP.

[27]  Ani Nenkova,et al.  Syntactic Simplification for Improving Content Selection in Multi-Document Summarization , 2004, COLING.

[28]  Lucia Specia,et al.  Vicinity-Driven Paragraph and Sentence Alignment for Comparable Corpora , 2016, ArXiv.

[29]  Manaal Faruqui,et al.  Learning To Split and Rephrase From Wikipedia Edit History , 2018, EMNLP.

[30]  Walt Detmar Meurers,et al.  Assessing the relative reading level of sentence pairs for text simplification , 2014, EACL.

[31]  Hang Li,et al.  “ Tony ” DNN Embedding for “ Tony ” Selective Read for “ Tony ” ( a ) Attention-based Encoder-Decoder ( RNNSearch ) ( c ) State Update s 4 SourceVocabulary Softmax Prob , 2016 .

[32]  Noémie Elhadad,et al.  Putting it Simply: a Context-Aware Approach to Lexical Simplification , 2011, ACL.

[33]  Mamoru Komachi,et al.  Building a Monolingual Parallel Corpus for Text Simplification Using Sentence Similarity Based on Alignment between Word Embeddings , 2016, COLING.

[34]  Horacio Saggion,et al.  Spanish Text Simplification: An Exploratory Study , 2011, Proces. del Leng. Natural.

[35]  Joachim Bingel,et al.  Predicting misreadings from gaze in children with reading difficulties , 2018, BEA@NAACL-HLT.

[36]  Wei Wu,et al.  Aligning Sentences from Standard Wikipedia to Simple Wikipedia , 2015, NAACL.

[37]  Sigrid Klerke,et al.  DSim, a Danish Parallel Corpus for Text Simplification , 2012, LREC.

[38]  Kevin Knight,et al.  Multi-Source Neural Translation , 2016, NAACL.

[39]  Mirella Lapata,et al.  WikiSimple: Automatic Simplification of Wikipedia Articles , 2011, AAAI.

[40]  Ehud Reiter,et al.  A Structured Review of the Validity of BLEU , 2018, CL.

[41]  Shachar Mirkin,et al.  Confidence-driven Rewriting for Improved Translation , 2013, MTSUMMIT.

[42]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[43]  Advaith Siddharthan,et al.  A survey of research on text simplification , 2014 .

[44]  Iryna Gurevych,et al.  A Monolingual Tree-based Translation Model for Sentence Simplification , 2010, COLING.

[45]  Eneko Agirre,et al.  Unsupervised Statistical Machine Translation , 2018, EMNLP.

[46]  David A. Smith,et al.  Quasi-Synchronous Grammars: Alignment by Soft Projection of Syntactic Dependencies , 2006, WMT@HLT-NAACL.

[47]  Christopher D. Manning,et al.  Get To The Point: Summarization with Pointer-Generator Networks , 2017, ACL.

[48]  Richard Evans,et al.  An evaluation of syntactic simplification rules for people with autism , 2014, PITR@EACL.

[49]  R. J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[50]  Sanja Stajner,et al.  One Step Closer to Automatic Evaluation of Text Simplification Systems , 2014, PITR@EACL.

[51]  David Kauchak,et al.  Learning to Simplify Sentences Using Wikipedia , 2011, Monolingual@ACL.

[52]  Christopher Potts,et al.  A large annotated corpus for learning natural language inference , 2015, EMNLP.

[53]  Adrià de Gispert,et al.  Source sentence simplification for statistical machine translation , 2017, Comput. Speech Lang..

[54]  David Chiang,et al.  An Introduction to Synchronous Grammars , 2006 .

[55]  Horacio Saggion,et al.  An Unsupervised Alignment Algorithm for Text Simplification Corpus Construction , 2011, Monolingual@ACL.

[56]  Lucia Specia,et al.  Shared task on quality assessment for text simplification , 2016 .

[57]  Mari Ostendorf,et al.  Natural language processing tools for reading level assessment and text simplification for bilingual education , 2007 .

[58]  Ari Rappoport,et al.  BLEU is Not Suitable for the Evaluation of Text Simplification , 2018, EMNLP.

[59]  Ari Rappoport,et al.  Universal Conceptual Cognitive Annotation (UCCA) , 2013, ACL.

[60]  P. Jaccard THE DISTRIBUTION OF THE FLORA IN THE ALPINE ZONE.1 , 1912 .

[61]  Jana M. Mason,et al.  Facilitating Reading Comprehension through Text Structure Manipulation. , 1979 .

[62]  Hong Yu,et al.  Sentence Simplification with Memory-Augmented Neural Networks , 2018, NAACL.

[63]  Noah A. Smith,et al.  Extracting Simplified Statements for Factual Question Generation , 2010 .

[64]  Rico Sennrich,et al.  Nematus: a Toolkit for Neural Machine Translation , 2017, EACL.

[65]  Ramakanth Pasunuru,et al.  Dynamic Multi-Level Multi-Task Learning for Sentence Simplification , 2018, COLING.

[66]  Samuel R. Bowman,et al.  A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.

[67]  Sowmya Vajjala,et al.  OneStopEnglish corpus: A new corpus for automatic readability assessment and text simplification , 2018, BEA@NAACL-HLT.

[68]  Chris Callison-Burch,et al.  Simple PPDB: A Paraphrase Database for Simplification , 2016, ACL.

[69]  Hong Sun,et al.  Joint Learning of a Dual SMT System for Paraphrase Generation , 2012, ACL.

[70]  Felice Dell'Orletta,et al.  Design and Annotation of the First Italian Corpus for Text Simplification , 2015, LAW@NAACL-HLT.

[71]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[72]  Goran Glavas,et al.  Event-Centered Simplification of News Stories , 2013, RANLP.

[73]  Matthew G. Snover,et al.  A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.

[74]  Ari Rappoport,et al.  A Transition-Based Directed Acyclic Graph Parser for UCCA , 2017, ACL.

[75]  Alexander H. Waibel,et al.  TriS: A Statistical Sentence Simplifier with Log-linear Models and Margin-based Discriminative Training , 2011, IJCNLP.

[76]  C. K. Ogden,et al.  Basic English : a general introduction with rules and grammar , 1930 .

[77]  Gustavo Henrique Paetzold,et al.  A survey of lexical simplification , 2018, Emerging Trends in Engineering, Science and Technology for Society, Energy and Environment.

[78]  Chris Callison-Burch,et al.  Optimizing Statistical Machine Translation for Text Simplification , 2016, TACL.

[79]  Sergiu Nisioi,et al.  Exploring Neural Text Simplification Models , 2017, ACL.

[80]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[81]  Hong Yu,et al.  Neural Semantic Encoders , 2016, EACL.

[82]  Lucia Specia Translating from Complex to Simplified Sentences , 2010, PROPOR.

[83]  Ricardo Baeza-Yates,et al.  Frequent Words Improve Readability and Short Words Improve Understandability for People with Dyslexia , 2013, INTERACT.

[84]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[85]  Chris Callison-Burch,et al.  Problems in Current Text Simplification Research: New Data Can Help , 2015, TACL.

[86]  Shashi Narayan,et al.  Unsupervised Sentence Simplification Using Deep Semantics , 2015, INLG.

[87]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[88]  Lucia Specia,et al.  Text Simplification as Tree Transduction , 2013, STIL.

[89]  Mark Dredze,et al.  Learning Simple Wikipedia: A Cogitation in Ascertaining Abecedarian Language , 2010, HLT-NAACL 2010.

[90]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[91]  Markus Dickinson,et al.  Sense-Specific Lexical Information for Reading Assistance , 2012, BEA@NAACL-HLT.

[92]  Sara Tonelli,et al.  SIMPITIKI: a Simplification corpus for Italian , 2016, CLiC-it/EVALITA.

[93]  Bambang Parmanto,et al.  Integrating Transformer and Paraphrase Rules for Sentence Simplification , 2018, EMNLP.

[94]  Martin Wattenberg,et al.  Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation , 2016, TACL.

[95]  Lucia Specia,et al.  Supporting the Adaptation of Texts for Poor Literacy Readers: a Text Simplification Editor for Brazilian Portuguese , 2009, BEA@NAACL.

[96]  Shashi Narayan,et al.  Hybrid Simplification using Deep Semantics and Machine Translation , 2014, ACL.

[97]  Isao Goto,et al.  Japanese news simplification: tak design, data set construction, and analysis of simplified text , 2015, MTSUMMIT.

[98]  Joachim Bingel,et al.  Lexi: A tool for adaptive, personalized text simplification , 2018, COLING.

[99]  Matthew Shardlow,et al.  A Survey of Automated Text Simplification , 2014 .

[100]  Mark Steedman,et al.  Assessing Relative Sentence Complexity using an Incremental CCG Parser , 2016, NAACL.

[101]  Matt Post,et al.  A Call for Clarity in Reporting BLEU Scores , 2018, WMT.

[102]  Horacio Saggion,et al.  Text Simplification Tools for Spanish , 2012, LREC.

[103]  Mirella Lapata,et al.  Sentence Compression as Tree Transduction , 2009, J. Artif. Intell. Res..

[104]  Walt Detmar Meurers,et al.  Readability assessment for text simplification: From analysing documents to identifying sentential simplifications , 2014 .

[105]  R. P. Fishburne,et al.  Derivation of New Readability Formulas (Automated Readability Index, Fog Count and Flesch Reading Ease Formula) for Navy Enlisted Personnel , 1975 .

[106]  David Kauchak,et al.  Improving Text Simplification Language Modeling Using Unsimplified Text Data , 2013, ACL.

[107]  Eneko Agirre,et al.  Unsupervised Neural Machine Translation , 2017, ICLR.

[108]  Hongyan Jing,et al.  Sentence Reduction for Automatic Text Summarization , 2000, ANLP.

[109]  Kevin Gimpel,et al.  Pushing the Limits of Paraphrastic Sentence Embeddings with Millions of Machine Translations , 2017, ArXiv.

[110]  Paolo Rosso,et al.  CATS: A Tool for Customized Alignment of Text Simplification Corpora , 2018, LREC.

[111]  Lucia Specia,et al.  EASSE: Easier Automatic Sentence Simplification Evaluation , 2019, EMNLP.

[112]  Yoav Goldberg,et al.  Split and Rephrase: Better Evaluation and a Stronger Baseline , 2018, ACL.

[113]  Sanja Stajner,et al.  Can Text Simplification Help Machine Translation? , 2016, EAMT.

[114]  Marie-Francine Moens,et al.  Text simplification for children , 2010, SIGIR 2010.

[115]  Goran Glavas,et al.  Simplifying Lexical Simplification: Do We Need Simplified Corpora? , 2015, ACL.

[116]  Gustavo Henrique Paetzold Lexical simplification for non-native English speakers , 2016 .

[117]  Daphne Koller,et al.  Sentence Simplification for Semantic Role Labeling , 2008, ACL.

[118]  Arantza Díaz de Ilarraza,et al.  Simple or Complex? Assessing the readability of Basque Texts , 2014, COLING.

[119]  Lucia Specia,et al.  Quality Estimation for Machine Translation , 2018, Computational Linguistics.

[120]  Daniel Marcu,et al.  Text Simplification for Information-Seeking Applications , 2004, CoopIS/DOA/ODBASE.

[121]  Raman Chandrasekar,et al.  Motivations and Methods for Text Simplification , 1996, COLING.

[122]  Alexander M. Rush,et al.  OpenNMT: Open-Source Toolkit for Neural Machine Translation , 2017, ACL.

[123]  Ricardo Baeza-Yates,et al.  DysWebxia 2.0!: more accessible text for people with dyslexia , 2013, W4A.

[124]  Michael Strube,et al.  Dependency Tree Based Sentence Compression , 2008, INLG.

[125]  Y. Singer,et al.  Ultraconservative online algorithms for multiclass problems , 2003 .

[126]  Tadashi Nomoto,et al.  Lexico-syntactic text simplification and compression with typed dependencies , 2014, COLING.

[127]  Yonatan Belinkov,et al.  What do Neural Machine Translation Models Learn about Morphology? , 2017, ACL.

[128]  Ani Nenkova,et al.  Beyond SumBasic: Task-focused summarization with sentence simplification and lexical expansion , 2007, Information Processing & Management.

[129]  Lucia Specia,et al.  MASSAlign: Alignment and Annotation of Comparable Documents , 2017, IJCNLP.

[130]  Dipti Misra Sharma,et al.  Exploring the effects of Sentence Simplification on Hindi to English Machine Translation System , 2014 .

[131]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[132]  Arthur C. Graesser,et al.  Automated Evaluation of Text and Discourse with Coh-Metrix: Introduction , 2014 .

[133]  Goran Glavas,et al.  Leveraging event-based semantics for automated text simplification , 2017, Expert Syst. Appl..

[134]  David Kauchak,et al.  Simple English Wikipedia: A New Text Simplification Task , 2011, ACL.

[135]  Shashi Narayan,et al.  Split and Rephrase , 2017, EMNLP.

[136]  Anirban Laha,et al.  Unsupervised Neural Text Simplification , 2018, ACL.

[137]  András Kornai,et al.  A Practical Approach to Language Complexity: A Wikipedia Case Study , 2012, PloS one.

[138]  Lucia Specia,et al.  Building a Brazilian Portuguese Parallel Corpus of Original and Simplified Texts , 2009 .

[139]  Cristian Danescu-Niculescu-Mizil,et al.  For the sake of simplicity: Unsupervised extraction of lexical simplifications from Wikipedia , 2010, NAACL.

[140]  Alon Lavie,et al.  Meteor Universal: Language Specific Translation Evaluation for Any Target Language , 2014, WMT@ACL.

[141]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[142]  Ari Rappoport,et al.  Semantic Structural Evaluation for Text Simplification , 2018, NAACL.

[143]  Vera Demberg,et al.  Psycholinguistic Models of Sentence Processing Improve Sentence Readability Ranking , 2017, EACL.

[144]  Hermann Ney,et al.  The Alignment Template Approach to Statistical Machine Translation , 2004, CL.

[145]  Catherine W. Hatcher,et al.  The Effects of Syntax on the Reading Comprehension of Hearing-Impaired Children. , 1981 .

[146]  Qing Zeng-Treitler,et al.  A semantic and syntactic text simplification tool for health content. , 2010, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[147]  Jan Snajder,et al.  Construction and evaluation of event graphs , 2014, Natural Language Engineering.

[148]  D. McNamara,et al.  A Linguistic Analysis of Simplified and Authentic Texts , 2007 .

[149]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[150]  Benoît Sagot,et al.  Reference-less Quality Estimation of Text Simplification Systems , 2018, ArXiv.

[151]  Maja Popović,et al.  Quality estimation for text simplification , 2016 .

[152]  Mirella Lapata,et al.  An abstractive approach to sentence compression , 2013, TIST.

[153]  André Freitas,et al.  A Sentence Simplification System for Improving Relation Extraction , 2016, COLING.

[154]  Mirella Lapata,et al.  Paraphrasing Revisited with Neural Machine Translation , 2017, EACL.

[155]  James Mayfield,et al.  Character N-Gram Tokenization for European Language Text Retrieval , 2004, Information Retrieval.