Translating sentences from 'original' to 'simplified' Spanish

Text Simplification (TS) aims to convert complex sentences into their simpler variants, which are more accessible to wider audiences. Several recent studies addressed this problem as a monolingual machine translation (MT) problem (translating from 'original' to 'simplified' language instead of translating from one language into another) using the standard phrase-based statistical machine translation (PB-SMT) model. We investigate whether the same approach would be equally successful regardless of the type of simplification we wish to learn (given that different target audiences require different levels of simplification). Our preliminary results indicate that the standard PB-SMT model might not be able to learn the strong simplifications which are needed for certain users, e.g. people with Down syndrome. However, the phrase-tables obtained during the translation process seem to be able to capture some adequate lexical simplifications.

[1]  David Kauchak,et al.  Learning to Simplify Sentences Using Wikipedia , 2011, Monolingual@ACL.

[2]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[3]  Noémie Elhadad,et al.  Putting it Simply: a Context-Aware Approach to Lexical Simplification , 2011, ACL.

[4]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[5]  Iryna Gurevych,et al.  A Monolingual Tree-based Translation Model for Sentence Simplification , 2010, COLING.

[6]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[7]  Lucia Specia,et al.  Learning When to Simplify Sentences for Natural Text Simplification , 2009 .

[8]  Mark Dredze,et al.  Learning Simple Wikipedia: A Cogitation in Ascertaining Abecedarian Language , 2010, HLT-NAACL 2010.

[9]  John Tait,et al.  Cohesive Generation of Syntactically Simplified Newspaper Text , 2000, TSD.

[10]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[11]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[12]  Horacio Saggion,et al.  Text Simplification in Simplext. Making Text More Accessible , 2011, Proces. del Leng. Natural.

[13]  Cristian Danescu-Niculescu-Mizil,et al.  For the sake of simplicity: Unsupervised extraction of lexical simplifications from Wikipedia , 2010, NAACL.

[14]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[15]  Emiel Krahmer,et al.  Sentence Simplification by Monolingual Machine Translation , 2012, ACL.

[16]  Lucia Specia Translating from Complex to Simplified Sentences , 2010, PROPOR.

[17]  Mirella Lapata,et al.  Learning to Simplify Sentences with Quasi-Synchronous Grammar and Integer Programming , 2011, EMNLP.