Word frequency and readability: Predicting the text-level readability with a lexical-level attribute

Assessment of text readability is important for assigning texts at the appropriate level to readers at different proficiency levels. The present research approached readability assessment from the lexical perspective of word frequencies derived from corpora assumed to reflect typical language experience. Three studies were conducted to test how the word-level feature of word frequency can be aggregated to characterise text-level readability. The results show that an effective use of word frequency for text readability assessment should take a range of characteristics of the distribution of words frequencies into account. For characterizing text readability, taking into account the standard deviation in addition to the mean word frequencies already significantly increases results. The best results are obtained using the mean frequencies of the words in language frequency bands or in bands obtained by agglomerative clustering of the word frequencies in the documents – though a comparison of within-corpus and cross-corpus results shows the limited generalizability of using high numbers of fine-grained frequency bands. Overall, the study advances our understanding of the relationship between word frequency and text readability and provides concrete options for more effectively making use of lexical frequency information in practice.

[1]  Mabel Vogel,et al.  An Objective Method of Determining Grade Placement of Children's Reading Material , 1928, The Elementary School Journal.

[2]  W. W. Patty,et al.  A Technique for Measuring the Vocabulary Burden of Textbooks , 1931 .

[3]  R. Flesch A new readability yardstick. , 1948, The Journal of applied psychology.

[4]  R. Solomon,et al.  Visual duration threshold as a function of word-probability. , 1951, Journal of experimental psychology.

[5]  A CHAPANIS,et al.  Do incorrectly perceived tachistoscopic stimuli convey some information? , 1953, Psychological review.

[6]  George R. Klare,et al.  The relationship of style difficulty to immediate retention and to acceptability of technical material. , 1955 .

[7]  R C JOHNSON,et al.  Word values, word frequency, and visual duration thresholds. , 1960, Psychological review.

[8]  M. C. Wittrock,et al.  Word Frequency and Reading Comprehensiony1 , 1974 .

[9]  R. P. Fishburne,et al.  Derivation of New Readability Formulas (Automated Readability Index, Fog Count and Flesch Reading Ease Formula) for Navy Enlisted Personnel , 1975 .

[10]  M A Just,et al.  A theory of reading: from eye fixations to comprehension. , 1980, Psychological review.

[11]  W. Glaser,et al.  The time course of picture-word interference. , 1984 .

[12]  K. Rayner,et al.  Lexical complexity and fixation times in reading: Effects of word frequency, verb complexity, and lexical ambiguity , 1986, Memory & cognition.

[13]  Randall J. Ryder,et al.  The Relationship Between Word Frequency and Word Knowledge , 1988 .

[14]  Michael C. Doyle,et al.  Effects of frequency on visual word recognition tasks: where are they? , 1989, Journal of experimental psychology. General.

[15]  J. B. Strother,et al.  The effect of syntactic simplification on reading EST texts as L1 and L2 , 1990 .

[16]  B. Laufer How Much Lexis is Necessary for Reading Comprehension , 1992 .

[17]  W. Levelt,et al.  Word frequency effects in speech production: Retrieval of syntactic information and of phonological form , 1994 .

[18]  Michael L. Kamil,et al.  Interpreting Relationships between L1 and L2 Reading: Consolidating the Linguistic Threshold and the Linguistic Interdependence Hypotheses , 1995 .

[19]  Peter W. Foltz,et al.  The Measurement of Textual Coherence with Latent Semantic Analysis. , 1998 .

[20]  Peter W. Foltz,et al.  Learning from text: Matching readers and texts by latent semantic analysis , 1998 .

[21]  David D. Qian,et al.  Assessing the Roles of Depth and Breadth of Vocabulary Knowledge in Reading Comprehension , 1999 .

[22]  David D. Qian,et al.  Investigating the Relationship Between Vocabulary Knowledge and Academic Reading Performance: An Assessment Perspective , 2002 .

[23]  Arthur C. Graesser,et al.  Coh-Metrix: Analysis of text on cohesion and language , 2004, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[24]  Hal Burdick,et al.  THE LEXILE FRAMEWORK AS AN APPROACH FOR READING MEASUREMENT AND SUCCESS , 2004 .

[25]  Gordon D. A. Brown,et al.  Contextual Diversity, Not Word Frequency, Determines Word-Naming and Lexical Decision Times , 2006, Psychological science.

[26]  Maxine Eskénazi,et al.  Combining Lexical and Grammatical Features to Improve Readability Measures for First and Second Language Texts , 2007, NAACL.

[27]  Danielle S. McNamara,et al.  Toward a New Readability: A Mixed Model Approach , 2007 .

[28]  H. Diessel Frequency effects in language acquisition, language use, and diachronic change , 2007 .

[29]  D. McNamara,et al.  Assessing Text Readability Using Cognitively Based Indices , 2008 .

[30]  B. Velichkovsky,et al.  Eye typing in application: A comparison of two interfacing systems with ALS patients , 2008 .

[31]  Marc Brysbaert,et al.  Moving beyond Kučera and Francis: A critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for American English , 2009, Behavior research methods.

[32]  Lijun Feng,et al.  Cognitively Motivated Features for Readability Assessment , 2009, EACL.

[33]  Arthur C. Graesser,et al.  Coh-Metrix: Capturing Linguistic Features of Cohesion , 2010 .

[34]  Lijun Feng,et al.  Automatic Readability Assessment , 2010 .

[35]  B. Laufer,et al.  Lexical threshold revisited: Lexical text coverage, learners' vocabulary size and reading comprehension , 2010 .

[36]  W. Grabe,et al.  The Percentage of Words Known in a Text and Reading Comprehension. , 2011 .

[37]  Arthur C. Graesser,et al.  Coh-Metrix , 2011 .

[38]  Patrick Watrin,et al.  On the Contribution of MWE-based Features to a Readability Formula for French as a Foreign Language , 2011, RANLP.

[39]  Rebekah George Benjamin Reconstructing Readability: Recent Developments and Recommendations in the Analysis of Text Difficulty , 2012 .

[40]  Walt Detmar Meurers,et al.  On Improving the Accuracy of Readability Classification using Insights from Second Language Acquisition , 2012, BEA@NAACL-HLT.

[41]  Walt Detmar Meurers,et al.  Readability Classification for German using Lexical, Syntactic, and Morphological Features , 2012, COLING.

[42]  Michael Flor,et al.  Lexical Tightness and Text Complexity , 2013 .

[43]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[44]  Marc Brysbaert,et al.  Subtlex-UK: A New and Improved Word Frequency Database for British English , 2014, Quarterly journal of experimental psychology.

[45]  David Kauchak,et al.  The effect of word familiarity on actual and perceived text difficulty. , 2014, Journal of the American Medical Informatics Association : JAMIA.

[46]  Xiaofei Lu,et al.  Lexical difficulty and diversity of American elementary school reading textbooks: Changes over the past century , 2014 .

[47]  Walt Detmar Meurers,et al.  CTAP: A Web-Based Tool Supporting Automatic Complexity Analysis , 2016, CL4LC@COLING 2016.