Text Simplification Tools for Spanish

In this paper we describe the development of a text simplification system for Spanish. Text simplification is the adaptation of a text to the special needs of certain groups of readers, such as language learners, people with cognitive difficulties and elderly people, among others. There is a clear need for simplified texts, but manual production and adaptation of existing texts is labour intensive and costly. Automatic simplification is a field which attracts growing attention in Natural Language Processing, but, to the best of our knowledge, there are no simplification tools for Spanish. We present a prototype for automatic simplification, which shows that the most important structural simplification operations can be successfully treated with an approach based on rules which can potentially be improved by statistical methods. For the development of this prototype we carried out a corpus study which aims at identifying the operations a text simplification system needs to carry out in order to produce an output similar to what human editors produce when they simplify texts.

[1]  Leo Wanner,et al.  A Development Enviroment For MTT-Based Sentence Generators. , 2000 .

[2]  Raman Chandrasekar,et al.  Motivations and Methods for Text Simplification , 1996, COLING.

[3]  Gabriela Ferraro,et al.  Simplification of Patent Claim Sentences for their Paraphrasing and Summarization , 2009, FLAIRS Conference.

[4]  Mari Ostendorf,et al.  Text simplification for language learners: a corpus analysis , 2007, SLaTE.

[5]  David Kauchak,et al.  Learning to Simplify Sentences Using Wikipedia , 2011, Monolingual@ACL.

[6]  John Tait,et al.  Cohesive Generation of Syntactically Simplified Newspaper Text , 2000, TSD.

[7]  Andrea Petz,et al.  People with Specific Learning Difficulties: Easy to Read and HCI , 2008, ICCHP.

[8]  Bernd Bohnet Efficient Parsing of Syntactic and Semantic Dependency Structures , 2009, CoNLL Shared Task.

[9]  Horacio Saggion,et al.  Spanish Text Simplification: An Exploratory Study , 2011, Proces. del Leng. Natural.

[10]  Leo Wanner,et al.  Making Text Resources Accessible to the Reader: the Case of Patent Claims , 2008, LREC.

[11]  Horacio Saggion,et al.  An Unsupervised Alignment Algorithm for Text Simplification Corpus Construction , 2011, Monolingual@ACL.

[12]  Leo Wanner,et al.  A development Environment for an MTT-Based Sentence Generator , 2000, INLG.

[13]  Iryna Gurevych,et al.  A Monolingual Tree-based Translation Model for Sentence Simplification , 2010, COLING.

[14]  Advaith Siddharthan,et al.  An architecture for a text simplification system , 2002, Language Engineering Conference, 2002. Proceedings.

[15]  Kalina Bontcheva,et al.  Using Uneven Margins SVM and Perceptron for Information Extraction , 2005, CoNLL.

[16]  Advaith Siddharthan,et al.  Text Simplification using Typed Dependencies: A Comparision of the Robustness of Different Generation Strategies , 2011, ENLG.

[17]  Caroline Gasperin,et al.  Challenging Choices for Text Simplification , 2010, PROPOR.

[18]  Kalina Bontcheva,et al.  Architectural elements of language engineering robustness , 2002, Natural Language Engineering.

[19]  Mariona Taulé,et al.  AnCora: Multilevel Annotated Corpora for Catalan and Spanish , 2008, LREC.

[20]  Renata Pontin de Mattos Fortes,et al.  Towards Brazilian Portuguese automatic text simplification systems , 2008, DocEng '08.