Making It Simplext

The way in which a text is written can be a barrier for many people. Automatic text simplification is a natural language processing technology that, when mature, could be used to produce texts that are adapted to the specific needs of particular users. Most research in the area of automatic text simplification has dealt with the English language. In this article, we present results from the Simplext project, which is dedicated to automatic text simplification for Spanish. We present a modular system with dedicated procedures for syntactic and lexical simplification that are grounded on the analysis of a corpus manually simplified for people with special needs. We carried out an automatic evaluation of the system’s output, taking into account the interaction between three different modules dedicated to different simplification aspects. One evaluation is based on readability metrics for Spanish and shows that the system is able to reduce the lexical and syntactic complexity of the texts. We also show, by means of a human evaluation, that sentence meaning is preserved in most cases. Our results, even if our work represents the first automatic text simplification system for Spanish that addresses different linguistic aspects, are comparable to the state of the art in English Automatic Text Simplification.

[1]  Sanja Stajner,et al.  Automatic Text Simplification in Spanish: A Comparative Evaluation of Complementing Modules , 2013, CICLing.

[2]  Advaith Siddharthan,et al.  Text simplification using synchronous dependency grammars: Generalising automatically harvested rules , 2014, INLG.

[3]  Sven Hartrumpf,et al.  A Readability Checker with Supervised Learning Using Deep Indicators , 2008, Informatica.

[4]  Son Bao Pham,et al.  Learning to Simplify Children Stories with Limited Data , 2014, ACIIDS.

[5]  Mirella Lapata,et al.  Learning to Simplify Sentences with Quasi-Synchronous Grammar and Integer Programming , 2011, EMNLP.

[6]  Raman Chandrasekar,et al.  Motivations and Methods for Text Simplification , 1996, COLING.

[7]  Johan Frid,et al.  Measuring Syntactic Complexity in Spontaneous Spoken Swedish , 2007, Language and speech.

[8]  A. D. Ilarraza,et al.  First Approach to Automatic Text Simplification in Basque Marı́a , 2012 .

[9]  Ruslan Mitkov,et al.  Simple or Not Simple? A Readability Question , 2015 .

[10]  Delphine Bernhard,et al.  Syntactic Sentence Simplification for French , 2014, PITR@EACL.

[11]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[12]  Mari Ostendorf,et al.  Text simplification for language learners: a corpus analysis , 2007, SLaTE.

[13]  Kevyn Collins-Thompson,et al.  Predicting reading difficulty with statistical language models , 2005, J. Assoc. Inf. Sci. Technol..

[14]  Iryna Gurevych,et al.  A Monolingual Tree-based Translation Model for Sentence Simplification , 2010, COLING.

[15]  Daphne Koller,et al.  Sentence Simplification for Semantic Role Labeling , 2008, ACL.

[16]  Ricardo Baeza-Yates,et al.  Simplify or help?: text simplification strategies for people with dyslexia , 2013, W4A.

[17]  Raquel Hervás,et al.  One Half or 50%? An Eye-Tracking Study of Number Representation Readability , 2013, INTERACT.

[18]  Arantza Díaz de Ilarraza,et al.  Simple or Complex? Assessing the readability of Basque Texts , 2014, COLING.

[19]  Renata Pontin de Mattos Fortes,et al.  Towards Brazilian Portuguese automatic text simplification systems , 2008, DocEng '08.

[20]  Horacio Saggion,et al.  Can Spanish Be Simpler? LexSiS: Lexical Simplification for Spanish , 2012, COLING.

[21]  Seth Spaulding,et al.  A Spanish Readability Formula , 1956 .

[22]  David Kauchak,et al.  Simple English Wikipedia: A New Text Simplification Task , 2011, ACL.

[23]  John Sabatini,et al.  The Automated Text Adaptation Tool , 2007, NAACL.

[24]  Horacio Saggion,et al.  Reporting simply: A lexical simplification strategy for enhancing text accessibility , 2012 .

[25]  Patrick Watrin,et al.  On the Contribution of MWE-based Features to a Readability Formula for French as a Foreign Language , 2011, RANLP.

[26]  Mirella Lapata,et al.  Models for Sentence Compression: A Comparison across Domains, Training Requirements and Evaluation Measures , 2006, ACL.

[27]  Thea van der Geest,et al.  Accessible Website Content Guidelines for Users with Intellectual Disabilities , 2007 .

[28]  Lucia Specia Translating from Complex to Simplified Sentences , 2010, PROPOR.

[29]  Ricardo Baeza-Yates,et al.  Frequent Words Improve Readability and Short Words Improve Understandability for People with Dyslexia , 2013, INTERACT.

[30]  Ralph Weischedel,et al.  A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION , 2005 .

[31]  Ricardo Baeza-Yates,et al.  Evaluation of DysWebxia: a reading app designed for people with dyslexia , 2014, W4A.

[32]  Ricardo Baeza-Yates,et al.  Graphical Schemes May Improve Readability but Not Understandability for People with Dyslexia , 2012, PITR@NAACL-HLT.

[33]  Alexander Gelbukh,et al.  Computational Linguistics and Intelligent Text Processing , 2015, Lecture Notes in Computer Science.

[34]  Alberto Anula Rebollo Lecturas adaptadas a la enseñanza del español como L2: variables lingüísticas para la determinación del nivel de legibilidad , 2008 .

[35]  David Kauchak,et al.  Sentence Simplification as Tree Transduction , 2013, PITR@ACL.

[36]  David Kauchak,et al.  Learning to Simplify Sentences Using Wikipedia , 2011, Monolingual@ACL.

[37]  Samuel Reese,et al.  FreeLing 2.1: Five Years of Open-source Language Processing Tools , 2010, LREC.

[38]  B. Miller A Readability Checker with Supervised Learning using Deep Syntactic and Semantic Indicators , 2008 .

[39]  William H. DuBay The Principles of Readability. , 2004 .

[40]  Marie-Francine Moens,et al.  Text simplification for children , 2010, SIGIR 2010.

[41]  Caroline Gasperin,et al.  Fostering Digital Inclusion and Accessibility: The PorSimples project for Simplification of Portuguese Texts , 2010, NAACL.

[42]  Advaith Siddharthan,et al.  Hybrid text simplification using synchronous dependency grammars with hand-written and automatically harvested rules , 2014, EACL.

[43]  Horacio Saggion,et al.  Comparing Resources for Spanish Lexical Simplification , 2013, SLSP.

[44]  Lucia Specia,et al.  Text Simplification as Tree Transduction , 2013, STIL.

[45]  George R. Doddington,et al.  Automatic Evaluation of Machine Translation Quality Using N-gram Co-Occurrence Statistics , 2002 .

[46]  Goran Glavaš,et al.  Event-centered simplication of news stories , 2013 .

[47]  Horacio Saggion,et al.  Text Simplification in Simplext. Making Text More Accessible , 2011, Proces. del Leng. Natural.

[48]  R. Mitkov,et al.  What can readability measures really tell us about text complexity , 2012 .

[49]  R. P. Fishburne,et al.  Derivation of New Readability Formulas (Automated Readability Index, Fog Count and Flesch Reading Ease Formula) for Navy Enlisted Personnel , 1975 .

[50]  Mari Ostendorf,et al.  Reading Level Assessment Using Support Vector Machines and Statistical Language Models , 2005, ACL.

[51]  J. Firth,et al.  Papers in linguistics, 1934-1951 , 1957 .

[52]  Jan Svartvik,et al.  A __ comprehensive grammar of the English language , 1988 .

[53]  Violeta Seretan Acquisition of Syntactic Simplification Rules for French , 2012, LREC.

[54]  Aurélien Max Writing for Language-Impaired Readers , 2006, CICLing.

[55]  Gabriela Ferraro,et al.  Simplification of Patent Claim Sentences for their Paraphrasing and Summarization , 2009, FLAIRS Conference.

[56]  Matthew G. Snover,et al.  A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.

[57]  Diana Maynard,et al.  JAPE: a Java Annotation Patterns Engine , 2000 .

[58]  Lucia Specia,et al.  SemEval-2012 Task 1: English Lexical Simplification , 2012, *SEMEVAL.

[59]  Ani Nenkova,et al.  Evaluating Content Selection in Summarization: The Pyramid Method , 2004, NAACL.

[60]  Lijun Feng,et al.  Cognitively Motivated Features for Readability Assessment , 2009, EACL.

[61]  M. Gernsbacher,et al.  The mechanism of suppression: a component of general comprehension skill. , 1991, Journal of experimental psychology. Learning, memory, and cognition.

[62]  Regina Barzilay,et al.  Catching the Drift: Probabilistic Content Models, with Applications to Generation and Summarization , 2004, NAACL.

[63]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[64]  Arantza Díaz de Ilarraza,et al.  Transforming Complex Sentences using Dependency Trees for Automatic Text Simplification in Basque , 2013, Proces. del Leng. Natural.

[65]  Sanja Stajner,et al.  Eliminación de frases y decisiones de división basadas en corpus para simplificación de textos en español , 2013, Computación y Sistemas.

[66]  R. Flesch A new readability yardstick. , 1948, The Journal of applied psychology.

[67]  Emiel Krahmer,et al.  Sentence Simplification by Monolingual Machine Translation , 2012, ACL.

[68]  Advaith Siddharthan,et al.  Text Simplification using Typed Dependencies: A Comparision of the Robustness of Different Generation Strategies , 2011, ENLG.

[69]  Daniel Marcu,et al.  Text Simplification for Information-Seeking Applications , 2004, CoopIS/DOA/ODBASE.

[70]  I. Bosque,et al.  Gramática descriptiva de la lengua española , 1999 .

[71]  Ethel Ong,et al.  Simplifying Text in Medical Literature , 2008 .

[72]  Leo Wanner,et al.  Making Text Resources Accessible to the Reader: the Case of Patent Claims , 2008, LREC.

[73]  Leo Wanner,et al.  A development Environment for an MTT-Based Sentence Generator , 2000, INLG.

[74]  Raman Chandrasekar,et al.  Automatic induction of rules for text simplification , 1997, Knowl. Based Syst..

[75]  Tadashi Nomoto,et al.  Lexico-syntactic text simplification and compression with typed dependencies , 2014, COLING.

[76]  Helmer Strik,et al.  Human language technology and communicative disabilities: requirements and possibilities for the future , 2012, Lang. Resour. Evaluation.

[77]  Cristian Danescu-Niculescu-Mizil,et al.  For the sake of simplicity: Unsupervised extraction of lexical simplifications from Wikipedia , 2010, NAACL.

[78]  Advaith Siddharthan,et al.  An architecture for a text simplification system , 2002, Language Engineering Conference, 2002. Proceedings.

[79]  Mari Ostendorf,et al.  A machine learning approach to reading level assessment , 2009, Comput. Speech Lang..

[80]  Horacio Saggion,et al.  Text simplification resources for Spanish , 2014, Lang. Resour. Evaluation.

[81]  Shashi Narayan,et al.  Hybrid Simplification using Deep Semantics and Machine Translation , 2014, ACL.

[82]  Horacio Saggion,et al.  Reducing Text Complexity through Automatic Lexical Simplification: an Empirical Study for Spanish , 2012, Proces. del Leng. Natural.

[83]  Horacio Saggion,et al.  Text Simplification Tools for Spanish , 2012, LREC.

[84]  John Tait,et al.  Cohesive Generation of Syntactically Simplified Newspaper Text , 2000, TSD.

[85]  Gregg C. Vanderheiden,et al.  Web Content Accessibility Guidelines (WCAG) 2.0 , 2008 .

[86]  Lucia Specia,et al.  Readability Assessment for Text Simplification , 2010 .

[87]  Bernd Bohnet Efficient Parsing of Syntactic and Semantic Dependency Structures , 2009, CoNLL Shared Task.

[88]  Mirella Lapata,et al.  WikiSimple: Automatic Simplification of Wikipedia Articles , 2011, AAAI.

[89]  Karen B. Moni,et al.  LITERACY: Meeting the challenge of limited literacy resources for adolescents and adults with intellectual disabilities , 2008 .

[90]  Simonetta Montemagni,et al.  READ–IT: Assessing Readability of Italian Texts with a View to Text Simplification , 2011, SLPAT.

[91]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[92]  Inmaculada Fajardo,et al.  Easy-to-read texts for students with intellectual disability: linguistic factors affecting comprehension. , 2014, Journal of applied research in intellectual disabilities : JARID.

[93]  Thierry Poibeau,et al.  Automatic Text Summarization: Past, Present and Future , 2013, Multi-source, Multilingual Information Extraction and Summarization.

[94]  Kalina Bontcheva,et al.  Architectural elements of language engineering robustness , 2002, Natural Language Engineering.

[95]  Miles Osborne,et al.  Statistical Machine Translation , 2010, Encyclopedia of Machine Learning and Data Mining.

[96]  Magnus Sahlgren,et al.  The Word-Space Model: using distributional analysis to represent syntagmatic and paradigmatic relations between words in high-dimensional vector spaces , 2006 .

[97]  C. Norbury,et al.  Barking up the wrong tree? Lexical ambiguity resolution in children with language impairments and autistic spectrum disorders. , 2005, Journal of experimental child psychology.

[98]  Lijun Feng,et al.  Comparing evaluation techniques for text readability software for adults with intellectual disabilities , 2009, Assets '09.

[99]  Sara Tonelli,et al.  ERNESTA: A Sentence Simplification Tool for Children's Stories in Italian , 2013, CICLing.

[100]  Horacio Saggion,et al.  Corpus-based Sentence Deletion and Split Decisions for Spanish Text Simplification , 2013 .

[101]  Eric SanJuan,et al.  Multilingual Summarization Evaluation without Human Models , 2010, COLING.

[102]  Partha Lal,et al.  Extract-based Summarization with Simplification , 2002, ACL 2002.

[103]  Gemma Bel-Enguix,et al.  Language Production, Cognition, and the Lexicon , 2015, Text, Speech and Language Technology.

[104]  cationR. Chandrasekar Automatic Induction of Rules for Text Simpli , 1997 .

[105]  Danielle S. McNamara,et al.  Assessing L2 reading texts at the intermediate level: An approximate replication of Crossley, Louwerse, McCarthy & McNamara (2007) , 2008, Language Teaching.

[106]  Nitin Madnani,et al.  Fluency, Adequacy, or HTER? Exploring Different Human Judgments with a Tunable MT Metric , 2009, WMT@EACL.

[107]  Lijun Feng,et al.  Automatic readability assessment for people with intellectual disabilities , 2009, ASAC.

[108]  Sanja Stajner,et al.  Translating sentences from 'original' to 'simplified' Spanish , 2014, Proces. del Leng. Natural.

[109]  E A Smith,et al.  Automated readability index. , 1967, AMRL-TR. Aerospace Medical Research Laboratories.

[110]  C. K. Ogden,et al.  Basic English : a general introduction with rules and grammar , 1930 .

[111]  Horacio Saggion,et al.  Spanish Text Simplification: An Exploratory Study , 2011, Proces. del Leng. Natural.

[112]  Willian Massami Watanabe Facilita: reading assistance to the functionally illiterate , 2010, W4A.

[113]  Noémie Elhadad,et al.  Putting it Simply: a Context-Aware Approach to Lexical Simplification , 2011, ACL.

[114]  Luo Si,et al.  A statistical model for scientific readability , 2001, CIKM '01.

[115]  G. Harry McLaughlin,et al.  SMOG Grading - A New Readability Formula. , 1969 .

[116]  Kentaro Inui,et al.  Text Simplification for Reading Assistance: A Project Note , 2003, IWP@ACL.