Reducing Text Complexity through Automatic Lexical Simplification: an Empirical Study for Spanish

In this paper we present the result of a study directed towards developing a lexical simplification module of an automatic simplification system for Spanish, intended for readers with cognitive disabilities. We here observe the word length and frequency distribution of two sets of texts that make up our parallel corpus, and we focus on cases of information expansion (through the insertion of definitions)and content reduction (through summarisation). Our ultimate goal is computational implementation of lexical changes in the future.

[1]  Partha Lal,et al.  Extract-based Summarization with Simplification , 2002, ACL 2002.

[2]  Horacio Saggion,et al.  Summary Generation and Evaluation in SumUM , 2000, IBERAMIA-SBIA.

[3]  Horacio Saggion,et al.  The University of Sheffield's TREC 2003 Q&A Experiments , 2003, TREC.

[4]  Pablo Gervás,et al.  Feasibility Analysis for SemiAutomatic Conversion of Text to Improve Readability , 2009, ICTA.

[5]  Mirella Lapata,et al.  10th Conference of the European Chapter of the Association for Computational Linguistics , 1999 .

[6]  Mark T. Maybury,et al.  Automatic Summarization , 2002, Computational Linguistics.

[7]  Horacio Saggion,et al.  Concept Identification and Presentation in the Context of Technical Text Summarization , 2000 .

[8]  Thierry Declerck,et al.  Cross Document Ontology based Information Extraction for Multimedia Retrieval , 2003 .

[9]  Thierry Declerck,et al.  Contribution of NLP to the Content Indexing of Multimedia Documents , 2004, CIVR.

[10]  Samuel Reese,et al.  FreeLing 2.1: Five Years of Open-source Language Processing Tools , 2010, LREC.

[11]  Horacio Saggion,et al.  The generation of abstracts by selective analysis , 1998 .

[12]  Kalina Bontcheva,et al.  Using a text engineering framework to build an extendable and portable IE-based summarisation system , 2002, ACL 2002.

[13]  Thierry Declerck,et al.  Event-Coreference across Multiple, Multi-lingual Sources in the Mumis Project , 2003, EACL.

[14]  Yorick Wilks,et al.  Extracting relational facts for indexing and retrieval of crime-scene photographs , 2003, Knowl. Based Syst..

[15]  Kalina Bontcheva,et al.  Extracting Information for Automatic Indexing of Multimedia Material , 2002, LREC.

[16]  Horacio Saggion A Study of the Effect of Document Representations in Clustering-Based Cross-Document Coreference Resolution , 2013, Multi-source, Multilingual Information Extraction and Summarization.

[17]  John Sabatini,et al.  The Automated Text Adaptation Tool , 2007, NAACL.

[18]  John Shawe-Taylor,et al.  The Perceptron Algorithm with Uneven Margins , 2002, ICML.

[19]  Wai Lam,et al.  MEAD - A Platform for Multidocument Multilingual Text Summarization , 2004, LREC.

[20]  Horacio Saggion,et al.  Where does Information come from? Corpus Analysis for Automatic Abstracting , 1998 .

[21]  Wai Lam,et al.  Evaluation Challenges in Large-Scale Document Summarization , 2003, ACL.

[22]  Thierry Poibeau,et al.  Multi-source, Multilingual Information Extraction and Summarization , 2012, Theory and Applications of Natural Language Processing.

[23]  Lucia Specia,et al.  Building a Brazilian Portuguese Parallel Corpus of Original and Simplified Texts , 2009 .

[24]  R. P. van de Riet,et al.  Applications of Natural Language to Information Systems , 1996 .

[25]  Wai Lam,et al.  Meta-evaluation of Summaries in a Cross-lingual Environment using Content-based Metrics , 2002, COLING.

[26]  Thierry Poibeau,et al.  Automatic Text Summarization: Past, Present and Future , 2013, Multi-source, Multilingual Information Extraction and Summarization.

[27]  John ffitch,et al.  Course notes , 1975, SIGSAM Bull..

[28]  Simone Teufel,et al.  Meta-evaluation of Summaries in a Cross-lingual Environmentusing Content-based , 2007 .

[29]  Lucia Specia Translating from Complex to Simplified Sentences , 2010, PROPOR.

[30]  Thierry Declerck,et al.  MUMIS -- Advanced information extraction for multimedia indexing and searching , 2003 .

[31]  Wai Lam,et al.  Developing Infrastructure for the Evaluation of Single and Multi-document Summarization Systems in a Cross-lingual Environment , 2002, LREC.

[32]  Antonio Moreno-Sandoval,et al.  CROSSING BARRIERS IN TEXT SUMMARIZATION RESEARCH , 2005 .