Putting it Simply: a Context-Aware Approach to Lexical Simplification

We present a method for lexical simplification. Simplification rules are learned from a comparable corpus, and the rules are applied in a context-aware fashion to input sentences. Our method is unsupervised. Furthermore, it does not require any alignment or correspondence among the complex and simple corpora. We evaluate the simplification according to three criteria: preservation of grammaticality, preservation of meaning, and degree of simplification. Results show that our method outperforms an established simplification baseline for both meaning preservation and simplification, while maintaining a high level of grammaticality.

[1]  Raman Chandrasekar,et al.  Motivations and Methods for Text Simplification , 1996, COLING.

[2]  Advaith Siddharthan,et al.  Syntactic Simplification and Text Cohesion , 2006 .

[3]  Roberto Navigli,et al.  SemEval-2007 Task 10: English Lexical Substitution Task , 2007, Fourth International Workshop on Semantic Evaluations (SemEval-2007).

[4]  Ion Androutsopoulos,et al.  A Survey of Paraphrasing and Textual Entailment Methods , 2009, J. Artif. Intell. Res..

[5]  Regina Barzilay,et al.  Sentence Alignment for Monolingual Comparable Corpora , 2003, EMNLP.

[6]  Pierre Zweigenbaum,et al.  ACL-IJCNLP 2009 BUCC 2009 2nd Workshop on Building and Using Comparable Corpora: from Parallel to Non-parallel Corpora , 2009 .

[7]  Siddhartha Jonnalagadda,et al.  Towards Effective Sentence Simplification for Automatic Processing of Biomedical Text , 2009, HLT-NAACL.

[8]  Daphne Koller,et al.  Applying Sentence Simplification to the CoNLL-2008 Shared Task , 2008, CoNLL.

[9]  Walter Daelemans,et al.  Automatic Sentence Simplification for Subtitling in Dutch and English , 2004, LREC.

[10]  Siobhan Devlin,et al.  Helping aphasic people process online information , 2006, Assets '06.

[11]  Cristian Danescu-Niculescu-Mizil,et al.  For the sake of simplicity: Unsupervised extraction of lexical simplifications from Wikipedia , 2010, NAACL.

[12]  David West,et al.  UNC-CH at DUC 2007: Query Expansion, Lexical Simplification and Sentence Selection Strategies for Multi-Document Summarization , 2007 .

[13]  Mark Dredze,et al.  Learning Simple Wikipedia: A Cogitation in Ascertaining Abecedarian Language , 2010, HLT-NAACL 2010.

[14]  Stuart M. Shieber,et al.  Towards Robust Context-Sensitive Sentence Alignment for Monolingual Corpora , 2006, EACL.

[15]  Ehud Reiter,et al.  Generating Readable Texts for Readers with Low Basic Skills , 2005, ENLG.

[16]  Noémie Elhadad,et al.  Mining a Lexicon of Technical Terms and Lay Equivalents , 2007, BioNLP@ACL.

[17]  Lijun Feng,et al.  Comparing evaluation techniques for text readability software for adults with intellectual disabilities , 2009, Assets '09.