Comparing Resources for Spanish Lexical Simplification

In this paper we study the effect of different lexical resources and strategies for selecting synonyms in a lexical simplification system for the Spanish language. The resources used for the experiments are the Spanish EuroWordNet, the Spanish Open Thesaurus and a combination of both. As for the synonym selection strategies, we have used both local and global contexts for word sense disambiguation. We present a novel evaluation framework in lexical simplification that takes into account the level of ambiguity of the word to be simplified. The evaluation compares various instances of the lexical simplification system, a gold standard, and a baseline. On the basis of our results we recommend different resources and word sense disambiguation methods depending on the ambiguity level of the target word to be simplified.

[1]  Robert Krovetz,et al.  More than One Sense Per Discourse , 1998 .

[2]  Yansong Feng,et al.  Title Generation with Quasi-Synchronous Grammar , 2010, EMNLP.

[3]  J. Fleiss Measuring nominal scale agreement among many raters. , 1971 .

[4]  Siobhan Devlin,et al.  Helping aphasic people process online information , 2006, Assets '06.

[5]  Raman Chandrasekar,et al.  Motivations and Methods for Text Simplification , 1996, COLING.

[6]  Lucia Specia Translating from Complex to Simplified Sentences , 2010, PROPOR.

[7]  Ricardo Baeza-Yates,et al.  Frequent Words Improve Readability and Short Words Improve Understandability for People with Dyslexia , 2013, INTERACT.

[8]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[9]  David Kauchak,et al.  Learning to Simplify Sentences Using Wikipedia , 2011, Monolingual@ACL.

[10]  Lucia Specia,et al.  SemEval-2012 Task 1: English Lexical Simplification , 2012, *SEMEVAL.

[11]  Alberto Barrón-Cedeño,et al.  Plagiarism Meets Paraphrasing: Insights for the Next Generation in Automatic Plagiarism Detection , 2013, CL.

[12]  George A. Miller,et al.  Introduction to WordNet: An On-line Lexical Database , 1990 .

[13]  Lucia Specia,et al.  Building a Brazilian Portuguese Parallel Corpus of Original and Simplified Texts , 2009 .

[14]  Cristian Danescu-Niculescu-Mizil,et al.  For the sake of simplicity: Unsupervised extraction of lexical simplifications from Wikipedia , 2010, NAACL.

[15]  Patrick Pantel,et al.  From Frequency to Meaning: Vector Space Models of Semantics , 2010, J. Artif. Intell. Res..

[16]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[17]  Emiel Krahmer,et al.  Sentence Simplification by Monolingual Machine Translation , 2012, ACL.

[18]  Piek Vossen,et al.  EuroWordNet: A multilingual database with lexical semantic networks , 1998, Springer Netherlands.

[19]  Horacio Saggion,et al.  Can Spanish Be Simpler? LexSiS: Lexical Simplification for Spanish , 2012, COLING.

[20]  Advaith Siddharthan,et al.  An architecture for a text simplification system , 2002, Language Engineering Conference, 2002. Proceedings.

[21]  Partha Lal,et al.  Extract-based Summarization with Simplification , 2002, ACL 2002.

[22]  Mirella Lapata,et al.  WikiSimple: Automatic Simplification of Wikipedia Articles , 2011, AAAI.

[23]  Iryna Gurevych,et al.  A Monolingual Tree-based Translation Model for Sentence Simplification , 2010, COLING.

[24]  John Sabatini,et al.  The Automated Text Adaptation Tool , 2007, NAACL.

[25]  Noémie Elhadad,et al.  Putting it Simply: a Context-Aware Approach to Lexical Simplification , 2011, ACL.

[26]  K. Rayner,et al.  Lexical complexity and fixation times in reading: Effects of word frequency, verb complexity, and lexical ambiguity , 1986, Memory & cognition.

[27]  David Yarowsky,et al.  One Sense Per Discourse , 1992, HLT.

[28]  Samuel Reese,et al.  FreeLing 2.1: Five Years of Open-source Language Processing Tools , 2010, LREC.

[29]  Caroline Gasperin,et al.  Fostering Digital Inclusion and Accessibility: The PorSimples project for Simplification of Portuguese Texts , 2010, NAACL.