论文信息 - Simple, readable sub-sentences - 字舞流文

Simple, readable sub-sentences

We present experiments using a new unsupervised approach to automatic text simplification, which builds on sampling and ranking via a loss function informed by readability research. The main idea is that a loss function can distinguish good simplification candidates among randomly sampled sub-sentences of the input sentence. Our approach is rated as equally grammatical and beginner reader appropriate as a supervised SMT-based baseline system by native speakers, but our setup performs more radical changes that better resembles the variation observed in human generated simplifications.

Sigrid Klerke | Anders Søgaard | Anders Søgaard | Sigrid Klerke

[1] Slav Petrov,et al. A Universal Part-of-Speech Tagset , 2011, LREC.

[2] Ani Nenkova,et al. Revisiting Readability: A Unified Framework for Predicting Text Quality , 2008, EMNLP.

[3] Jonas Rybing,et al. CogFLUX : Grunden till ett automatiskt textförenklingssystem för svenska , 2009 .

[4] Renata Pontin de Mattos Fortes,et al. A corpus analysis of simple account texts and the proposal of simplification strategies: first steps towards text simplification systems , 2008, SIGDOC '08.

[5] Mirella Lapata,et al. Learning to Simplify Sentences with Quasi-Synchronous Grammar and Integer Programming , 2011, EMNLP.

[6] John Tait,et al. Cohesive Generation of Syntactically Simplified Newspaper Text , 2000, TSD.

[7] Andreas Stolcke,et al. SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[8] Violeta Seretan. Acquisition of Syntactic Simplification Rules for French , 2012, LREC.

[9] Lucia Specia,et al. SemEval-2012 Task 1: English Lexical Simplification , 2012, *SEMEVAL.

[10] Mari Ostendorf,et al. Text simplification for language learners: a corpus analysis , 2007, SLaTE.

[11] Mari Ostendorf,et al. Identifying targets for syntactic simplification , 2011, SLaTE.

[12] Advaith Siddharthan,et al. Text Simplification using Typed Dependencies: A Comparision of the Robustness of Different Generation Strategies , 2011, ENLG.

[13] David Kauchak,et al. Simple English Wikipedia: A New Text Simplification Task , 2011, ACL.

[14] Horacio Saggion,et al. A Hybrid System for Spanish Text Simplification , 2012, SLPAT@HLT-NAACL.

[15] Cristian Danescu-Niculescu-Mizil,et al. For the sake of simplicity: Unsupervised extraction of lexical simplifications from Wikipedia , 2010, NAACL.

[16] Napoleon Katsos,et al. Offline Sentence Processing Measures for testing Readability with Users , 2012, PITR@NAACL-HLT.

[17] Graeme Hirst,et al. Building Readability Lexicons with Unannotated Corpora , 2012, PITR@NAACL-HLT.

[18] Mari Ostendorf,et al. Reading Level Assessment Using Support Vector Machines and Statistical Language Models , 2005, ACL.

[19] Robert N. Kantor,et al. On the Failure of Readability Formulas to Define Readable Texts: A Case Study from Adaptations. , 1982 .

[20] C. Bjornsson. Readability of Newspapers in 11 Languages. , 1983 .

[21] Siobhan Devlin,et al. Simplifying Text for Language-Impaired Readers , 1999, EACL.

[22] Raman Chandrasekar,et al. Motivations and Methods for Text Simplification , 1996, COLING.

[23] Bernd Bohnet,et al. Very high accuracy and fast dependency parsing is not a contradiction , 2010, COLING 2010.

[24] Marie-Francine Moens,et al. A Dataset for the Evaluation of Lexical Simplification , 2012, CICLing.

[25] Lucia Specia. Translating from Complex to Simplified Sentences , 2010, PROPOR.

[26] Walter Daelemans,et al. Automatic Sentence Simplification for Subtitling in Dutch and English , 2004, LREC.

[27] Philipp Koehn,et al. Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[28] Horacio Saggion,et al. Towards Automatic Lexical Simplification in Spanish: An Empirical Study , 2012, PITR@NAACL-HLT.

[29] Noah A. Smith,et al. Extracting Simpliﬁed Statements for Factual Question Generation , 2010 .

[30] Jonathan Anderson. Lix and Rix: Variations on a Little-Known Readability Index. , 1983 .

[31] M. Trautner,et al. The Danish Dependency Treebank and the DTAG Treebank Tool , 2003 .

[32] R. Flesch. A new readability yardstick. , 1948, The Journal of applied psychology.

[33] Mauro Cettolo,et al. IRSTLM: an open source toolkit for handling large scale language models , 2008, INTERSPEECH.

[34] Advaith Siddharthan,et al. Complex Lexico-syntactic Reformulation of Sentences Using Typed Dependency Representations , 2010, INLG.

[35] Iryna Gurevych,et al. A Monolingual Tree-based Translation Model for Sentence Simplification , 2010, COLING.

[36] Walt Detmar Meurers,et al. On Improving the Accuracy of Readability Classification using Insights from Second Language Acquisition , 2012, BEA@NAACL-HLT.

[37] Sigrid Klerke,et al. DSim, a Danish Parallel Corpus for Text Simplification , 2012, LREC.

[38] Hermann Ney,et al. A Comparison of Alignment Models for Statistical Machine Translation , 2000, COLING.