Simple or Not Simple? A Readability Question

Text Simplification (TS) has taken off as an important Natural Language Processing (NLP) application which promises to offer a significant societal impact in that it can be employed to the benefit of users with limited language comprehension skills such as children, foreigners who do not have a good command of a language, and readers struggling with a language disability. With the recent emergence of various TS systems, the question we are faced with is how to automatically evaluate their performance given that access to target users might be difficult. This chapter addresses one aspect of this issue by exploring whether existing readability formulae could be applied to assess the level of simplification offered by a TS system. It focuses on three readability indices for Spanish. The indices are first adapted in a way that allows them to be computed automatically and then applied to two corpora of original and manually simplified texts. The first corpus has been compiled as part of the Simplext project targeting people with Down syndrom, and the second corpus as part of the FIRST project, where the users are people with autism spectrum disorder. The experiments show that there is a significant correlation between each of the readability indices and eighteen linguistically motivated features which might be seen as reading obstacles for various target populations, thus indicating the possibility of using those indices as a measure of the degree of simplification achieved by TS systems. Various ways they can be used in TS are further illustrated by comparing their values when applied to four different corpora.

[1]  M. Carter Diagnostic and Statistical Manual of Mental Disorders, 5th ed. , 2014 .

[2]  Mari Ostendorf,et al.  Reading Level Assessment Using Support Vector Machines and Statistical Language Models , 2005, ACL.

[3]  R. Flesch A new readability yardstick. , 1948, The Journal of applied psychology.

[4]  Emiel Krahmer,et al.  Sentence Simplification by Monolingual Machine Translation , 2012, ACL.

[5]  Luz Rello,et al.  DysWebxia: a model to improve accessibility of the textual web for dyslexic users , 2012, ASAC.

[6]  Iryna Gurevych,et al.  A Monolingual Tree-based Translation Model for Sentence Simplification , 2010, COLING.

[7]  Sanja Stajner,et al.  Adapting Text Simplification Decisions to Different Text Genres and Target Users , 2013, Proces. del Leng. Natural.

[8]  David Kauchak,et al.  Learning to Simplify Sentences Using Wikipedia , 2011, Monolingual@ACL.

[9]  Piek Vossen,et al.  EuroWordNet: A multilingual database with lexical semantic networks , 1998, Springer Netherlands.

[10]  Seth Spaulding,et al.  A Spanish Readability Formula , 1956 .

[11]  Michael J Cortese,et al.  Visual word recognition of single-syllable words. , 2004, Journal of experimental psychology. General.

[12]  Mari Ostendorf,et al.  A machine learning approach to reading level assessment , 2009, Comput. Speech Lang..

[13]  Lijun Feng,et al.  Cognitively Motivated Features for Readability Assessment , 2009, EACL.

[14]  Ricardo Baeza-Yates,et al.  Frequent Words Improve Readability and Short Words Improve Understandability for People with Dyslexia , 2013, INTERACT.

[15]  Mirella Lapata,et al.  WikiSimple: Automatic Simplification of Wikipedia Articles , 2011, AAAI.

[16]  William H. DuBay The Principles of Readability. , 2004 .

[17]  Fernando Cuetos Vega,et al.  El efecto polisemia: Ahora lo ves otra vez , 1997 .

[18]  Regina Barzilay,et al.  Sentence Alignment for Monolingual Comparable Corpora , 2003, EMNLP.

[19]  Goran Glavaš,et al.  Event-centered simplication of news stories , 2013 .

[20]  Horacio Saggion,et al.  Text Simplification in Simplext. Making Text More Accessible , 2011, Proces. del Leng. Natural.

[21]  J. Jastrzembski Multiple meanings, number of related meanings, frequency of occurrence, and the lexicon , 1981, Cognitive Psychology.

[22]  R. Mitkov,et al.  What can readability measures really tell us about text complexity , 2012 .

[23]  R. P. Fishburne,et al.  Derivation of New Readability Formulas (Automated Readability Index, Fog Count and Flesch Reading Ease Formula) for Navy Enlisted Personnel , 1975 .

[24]  Helmer Strik,et al.  Human language technology and communicative disabilities: requirements and possibilities for the future , 2012, Lang. Resour. Evaluation.

[25]  W. H. Douma De leesbaarheid van landbouwbladen : een onderzoek naar en een toepassing van leesbaarheidsformules , 1960 .

[26]  C. Norbury,et al.  Barking up the wrong tree? Lexical ambiguity resolution in children with language impairments and autistic spectrum disorders. , 2005, Journal of experimental child psychology.

[27]  M. Glanzer,et al.  Analysis of the word-frequency effect in recognition memory , 1976 .

[28]  Advaith Siddharthan,et al.  Syntactic Simplification and Text Cohesion , 2006 .

[29]  Ricardo Baeza-Yates,et al.  Simplify or help?: text simplification strategies for people with dyslexia , 2013, W4A.

[30]  Sara Tonelli,et al.  ERNESTA: A Sentence Simplification Tool for Children's Stories in Italian , 2013, CICLing.

[31]  Horacio Saggion,et al.  Corpus-based Sentence Deletion and Split Decisions for Spanish Text Simplification , 2013 .

[32]  M. Coleman,et al.  A computer readability formula designed for machine scoring. , 1975 .

[33]  Renata Pontin de Mattos Fortes,et al.  A corpus analysis of simple account texts and the proposal of simplification strategies: first steps towards text simplification systems , 2008, SIGDOC '08.

[34]  Mirella Lapata,et al.  Learning to Simplify Sentences with Quasi-Synchronous Grammar and Integer Programming , 2011, EMNLP.

[35]  Lijun Feng,et al.  Automatic readability assessment for people with intellectual disabilities , 2009, ASAC.

[36]  J. Chall,et al.  A FORMULA FOR PREDICTING READABILITY , 1948 .

[37]  E A Smith,et al.  Automated readability index. , 1967, AMRL-TR. Aerospace Medical Research Laboratories.

[38]  Noam Chomsky Knowledge of language: its nature, origin, and use , 1988 .

[39]  Véronique Hoste,et al.  Towards an Improved Methodology for Automated Readability Prediction , 2010, LREC.

[40]  Sanja Stajner,et al.  Readability Indices for Automatic Evaluation of Text Simplification Systems: A Feasibility Study for Spanish , 2013, IJCNLP.

[41]  Christian Smith,et al.  Towards a Rule Based System for Automatic Simplification of Texts , 2010 .

[42]  G. Harry McLaughlin,et al.  SMOG Grading - A New Readability Formula. , 1969 .

[43]  Kentaro Inui,et al.  Text Simplification for Reading Assistance: A Project Note , 2003, IWP@ACL.

[44]  R. Gunning The Technique of Clear Writing. , 1968 .

[45]  Mari Ostendorf,et al.  Text simplification for language learners: a corpus analysis , 2007, SLaTE.