Simple, readable sub-sentences

We present experiments using a new unsupervised approach to automatic text simplification, which builds on sampling and ranking via a loss function informed by readability research. The main idea is that a loss function can distinguish good simplification candidates among randomly sampled sub-sentences of the input sentence. Our approach is rated as equally grammatical and beginner reader appropriate as a supervised SMT-based baseline system by native speakers, but our setup performs more radical changes that better resembles the variation observed in human generated simplifications.

[1]  Slav Petrov,et al.  A Universal Part-of-Speech Tagset , 2011, LREC.

[2]  Ani Nenkova,et al.  Revisiting Readability: A Unified Framework for Predicting Text Quality , 2008, EMNLP.

[3]  Jonas Rybing,et al.  CogFLUX : Grunden till ett automatiskt textförenklingssystem för svenska , 2009 .

[4]  Renata Pontin de Mattos Fortes,et al.  A corpus analysis of simple account texts and the proposal of simplification strategies: first steps towards text simplification systems , 2008, SIGDOC '08.

[5]  Mirella Lapata,et al.  Learning to Simplify Sentences with Quasi-Synchronous Grammar and Integer Programming , 2011, EMNLP.

[6]  John Tait,et al.  Cohesive Generation of Syntactically Simplified Newspaper Text , 2000, TSD.

[7]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[8]  Violeta Seretan Acquisition of Syntactic Simplification Rules for French , 2012, LREC.

[9]  Lucia Specia,et al.  SemEval-2012 Task 1: English Lexical Simplification , 2012, *SEMEVAL.

[10]  Mari Ostendorf,et al.  Text simplification for language learners: a corpus analysis , 2007, SLaTE.

[11]  Mari Ostendorf,et al.  Identifying targets for syntactic simplification , 2011, SLaTE.

[12]  Advaith Siddharthan,et al.  Text Simplification using Typed Dependencies: A Comparision of the Robustness of Different Generation Strategies , 2011, ENLG.

[13]  David Kauchak,et al.  Simple English Wikipedia: A New Text Simplification Task , 2011, ACL.

[14]  Horacio Saggion,et al.  A Hybrid System for Spanish Text Simplification , 2012, SLPAT@HLT-NAACL.

[15]  Cristian Danescu-Niculescu-Mizil,et al.  For the sake of simplicity: Unsupervised extraction of lexical simplifications from Wikipedia , 2010, NAACL.

[16]  Napoleon Katsos,et al.  Offline Sentence Processing Measures for testing Readability with Users , 2012, PITR@NAACL-HLT.

[17]  Graeme Hirst,et al.  Building Readability Lexicons with Unannotated Corpora , 2012, PITR@NAACL-HLT.

[18]  Mari Ostendorf,et al.  Reading Level Assessment Using Support Vector Machines and Statistical Language Models , 2005, ACL.

[19]  Robert N. Kantor,et al.  On the Failure of Readability Formulas to Define Readable Texts: A Case Study from Adaptations. , 1982 .

[20]  C. Bjornsson Readability of Newspapers in 11 Languages. , 1983 .

[21]  Siobhan Devlin,et al.  Simplifying Text for Language-Impaired Readers , 1999, EACL.

[22]  Raman Chandrasekar,et al.  Motivations and Methods for Text Simplification , 1996, COLING.

[23]  Bernd Bohnet,et al.  Very high accuracy and fast dependency parsing is not a contradiction , 2010, COLING 2010.

[24]  Marie-Francine Moens,et al.  A Dataset for the Evaluation of Lexical Simplification , 2012, CICLing.

[25]  Lucia Specia Translating from Complex to Simplified Sentences , 2010, PROPOR.

[26]  Walter Daelemans,et al.  Automatic Sentence Simplification for Subtitling in Dutch and English , 2004, LREC.

[27]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[28]  Horacio Saggion,et al.  Towards Automatic Lexical Simplification in Spanish: An Empirical Study , 2012, PITR@NAACL-HLT.

[29]  Noah A. Smith,et al.  Extracting Simplified Statements for Factual Question Generation , 2010 .

[30]  Jonathan Anderson Lix and Rix: Variations on a Little-Known Readability Index. , 1983 .

[31]  M. Trautner,et al.  The Danish Dependency Treebank and the DTAG Treebank Tool , 2003 .

[32]  R. Flesch A new readability yardstick. , 1948, The Journal of applied psychology.

[33]  Mauro Cettolo,et al.  IRSTLM: an open source toolkit for handling large scale language models , 2008, INTERSPEECH.

[34]  Advaith Siddharthan,et al.  Complex Lexico-syntactic Reformulation of Sentences Using Typed Dependency Representations , 2010, INLG.

[35]  Iryna Gurevych,et al.  A Monolingual Tree-based Translation Model for Sentence Simplification , 2010, COLING.

[36]  Walt Detmar Meurers,et al.  On Improving the Accuracy of Readability Classification using Insights from Second Language Acquisition , 2012, BEA@NAACL-HLT.

[37]  Sigrid Klerke,et al.  DSim, a Danish Parallel Corpus for Text Simplification , 2012, LREC.

[38]  Hermann Ney,et al.  A Comparison of Alignment Models for Statistical Machine Translation , 2000, COLING.