Improving Human Text Simplification with Sentence Fusion

The quality of fully automated text simplification systems is not good enough for use in real-world settings; instead, human simplifications are used. In this paper, we examine how to improve the cost and quality of human simplifications by leveraging crowdsourcing. We introduce a graph-based sentence fusion approach to augment human simplifications and a reranking approach to both select high quality simplifications and to allow for targeting simplifications with varying levels of simplicity. Using the Newsela dataset (Xu et al., 2015) we show consistent improvements over experts at varying simplification levels and find that the additional sentence fusion simplifications allow for simpler output than the human simplifications alone.

[1]  Advaith Siddharthan,et al.  A survey of research on text simplification , 2014 .

[2]  Jonas Mueller,et al.  Siamese Recurrent Architectures for Learning Sentence Similarity , 2016, AAAI.

[3]  Chris Callison-Burch,et al.  Problems in Current Text Simplification Research: New Data Can Help , 2015, TACL.

[4]  Bertrand Thirion,et al.  Learning to rank from medical imaging data , 2012, MLMI.

[5]  Alexander M. Rush,et al.  Character-Aware Neural Language Models , 2015, AAAI.

[6]  Chris Callison-Burch,et al.  Fast, Cheap, and Creative: Evaluating Translation Quality Using Amazon’s Mechanical Turk , 2009, EMNLP.

[7]  J. Y. Yen,et al.  Finding the K Shortest Loopless Paths in a Network , 2007 .

[8]  Lucia Specia,et al.  An Analysis of Crowdsourced Text Simplifications , 2014, PITR@EACL.

[9]  M. Marelli,et al.  SemEval-2014 Task 1: Evaluation of Compositional Distributional Semantic Models on Full Sentences through Semantic Relatedness and Textual Entailment , 2014, *SEMEVAL.

[10]  Shashi Narayan,et al.  Unsupervised Sentence Simplification Using Deep Semantics , 2015, INLG.

[11]  Matthew Shardlow,et al.  Neural Text Simplification of Clinical Letters with a Domain Specific Phrase Table , 2019, ACL.

[12]  C. Zarcadoolas,et al.  The simplicity complex: exploring simplified health messages in a complex world. , 2011, Health promotion international.

[13]  Chih-Jen Lin,et al.  Large-Scale Linear RankSVM , 2014, Neural Computation.

[14]  Walter S. Lasecki,et al.  Measuring text simplification with the crowd , 2015, W4A.

[15]  Katja Filippova,et al.  Multi-Sentence Compression: Finding Shortest Paths in Word Graphs , 2010, COLING.

[16]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[17]  Alexander M. Rush,et al.  Abstractive Sentence Summarization with Attentive Recurrent Neural Networks , 2016, NAACL.

[18]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[19]  Chris Callison-Burch,et al.  Creating Speech and Language Data With Amazon’s Mechanical Turk , 2010, Mturk@HLT-NAACL.

[20]  Chris Callison-Burch,et al.  Complexity-Weighted Loss and Diverse Reranking for Sentence Simplification , 2019, NAACL.

[21]  Thorsten Brants,et al.  One billion word benchmark for measuring progress in statistical language modeling , 2013, INTERSPEECH.

[22]  Chris Callison-Burch,et al.  Optimizing Statistical Machine Translation for Text Simplification , 2016, TACL.

[23]  Ben Taskar,et al.  Alignment by Agreement , 2006, NAACL.

[24]  Bowen Zhou,et al.  Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond , 2016, CoNLL.

[25]  Max Schwarzer,et al.  Human Evaluation for Text Simplification : The Simplicity-Adequacy Tradeoff , 2018 .

[26]  Mirella Lapata,et al.  Sentence Simplification with Deep Reinforcement Learning , 2017, EMNLP.

[27]  Chris Callison-Burch,et al.  Crowdsourcing Translation: Professional Quality from Non-Professionals , 2011, ACL.

[28]  Matthew Shardlow,et al.  A Survey of Automated Text Simplification , 2014 .