Can Spanish Be Simpler? LexSiS: Lexical Simplification for Spanish

Lexical simplification is the task of replacing a word in a given context by an easier-to-understand synonym. Although a number of lexical simplification approaches have been developed in recent years, most of them have been applied to English, with recent work taking advantage of parallel monolingual datasets for training. Here we present LexSiS, a lexical simplification system for Spanish that does not require a parallel corpus, but instead relies on freely available resources, such as an on-line dictionary and the Web as a corpus. LexSiS uses three techniques for finding a suitable word substitute: a word vector model, word frequency, and word length. In experiments with human informants, we have verified that LexSiS performs better than a hard-to-beat baseline based on synonym frequency.

[1]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[2]  David Kauchak,et al.  Simple English Wikipedia: A New Text Simplification Task , 2011, ACL.

[3]  Patrick Pantel,et al.  From Frequency to Meaning: Vector Space Models of Semantics , 2010, J. Artif. Intell. Res..

[4]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[5]  Renata Pontin de Mattos Fortes,et al.  Towards Brazilian Portuguese automatic text simplification systems , 2008, DocEng '08.

[6]  Emiel Krahmer,et al.  Sentence Simplification by Monolingual Machine Translation , 2012, ACL.

[7]  George A. Miller,et al.  Introduction to WordNet: An On-line Lexical Database , 1990 .

[8]  Lucia Specia,et al.  Building a Brazilian Portuguese Parallel Corpus of Original and Simplified Texts , 2009 .

[9]  Cristian Danescu-Niculescu-Mizil,et al.  For the sake of simplicity: Unsupervised extraction of lexical simplifications from Wikipedia , 2010, NAACL.

[10]  Partha Lal,et al.  Extract-based Summarization with Simplification , 2002, ACL 2002.

[11]  George Kingsley Zipf,et al.  The Psychobiology of Language , 2022 .

[12]  Piek Vossen,et al.  EuroWordNet: A multilingual database with lexical semantic networks , 1998, Springer Netherlands.

[13]  Advaith Siddharthan,et al.  An architecture for a text simplification system , 2002, Language Engineering Conference, 2002. Proceedings.

[14]  David Kauchak,et al.  Learning to Simplify Sentences Using Wikipedia , 2011, Monolingual@ACL.

[15]  Lucia Specia,et al.  SemEval-2012 Task 1: English Lexical Simplification , 2012, *SEMEVAL.

[16]  Raman Chandrasekar,et al.  Motivations and Methods for Text Simplification , 1996, COLING.

[17]  Iryna Gurevych,et al.  A Monolingual Tree-based Translation Model for Sentence Simplification , 2010, COLING.

[18]  Pablo Gervás,et al.  Feasibility Analysis for SemiAutomatic Conversion of Text to Improve Readability , 2009, ICTA.

[19]  John Sabatini,et al.  The Automated Text Adaptation Tool , 2007, NAACL.

[20]  Noémie Elhadad,et al.  Putting it Simply: a Context-Aware Approach to Lexical Simplification , 2011, ACL.

[21]  Caroline Gasperin,et al.  Fostering Digital Inclusion and Accessibility: The PorSimples project for Simplification of Portuguese Texts , 2010, NAACL.

[22]  Rada Mihalcea,et al.  Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Langu , 2011, ACL 2011.

[23]  K. Rayner,et al.  Lexical complexity and fixation times in reading: Effects of word frequency, verb complexity, and lexical ambiguity , 1986, Memory & cognition.

[24]  J. Hyönä,et al.  Eye fixation patterns among dyslexic and normal readers: effects of word length and word frequency. , 1995, Journal of experimental psychology. Learning, memory, and cognition.

[25]  Horacio Saggion,et al.  Text Simplification in Simplext. Making Text More Accessible , 2011, Proces. del Leng. Natural.

[26]  R. Schiffer Psychobiology of Language , 1986 .

[27]  Yansong Feng,et al.  Title Generation with Quasi-Synchronous Grammar , 2010, EMNLP.

[28]  Barbara Plank,et al.  Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies , 2011 .

[29]  Horacio Saggion,et al.  Text Simplification Tools for Spanish , 2012, LREC.

[30]  J. Fleiss Measuring nominal scale agreement among many raters. , 1971 .

[31]  Mirella Lapata,et al.  WikiSimple: Automatic Simplification of Wikipedia Articles , 2011, AAAI.

[32]  Horacio Saggion,et al.  An Unsupervised Alignment Algorithm for Text Simplification Corpus Construction , 2011, Monolingual@ACL.

[33]  Lucia Specia Translating from Complex to Simplified Sentences , 2010, PROPOR.

[34]  Siobhan Devlin,et al.  Helping aphasic people process online information , 2006, Assets '06.

[35]  Mirella Lapata,et al.  Learning to Simplify Sentences with Quasi-Synchronous Grammar and Integer Programming , 2011, EMNLP.

[36]  Horacio Saggion,et al.  Towards Automatic Lexical Simplification in Spanish: An Empirical Study , 2012, PITR@NAACL-HLT.

[37]  Samuel Reese,et al.  FreeLing 2.1: Five Years of Open-source Language Processing Tools , 2010, LREC.