Digital Words: Moving Forward with Measuring the Readability of Online Texts

The readability of a digital text can influence people’s information acquisition (Wikipedia articles), online security (how-to articles), and even health (WebMD). Readability metrics can also alter search rankings and are used to evaluate AI system performance. However, prior work on measuring readability has significant gaps, especially for HCI applications. Prior work has (a) focused on grade-school texts, (b) ignored domain-specific, jargon-heavy texts (e.g., health advice), and (c) failed to compare metrics, especially in the context of scaling to use with online corpora. This paper addresses these shortcomings by comparing well-known readabilitymeasures and a novel domain-specific approach across four different corpora: crowd-worker generated stories, Wikipedia articles, security and privacy advice, and health information. We evaluate the convergent, discriminant, and content validity of each measure and detail tradeoffs in domain-specificity and participant burden. These results provide a foundation for more accurate readability measurements in HCI. CCS CONCEPTS • Human-centered computing → HCI theory, concepts and models; Empirical studies in HCI ;

[1]  Jack Mostow,et al.  Generating Diagnostic Multiple Choice Comprehension Cloze Questions , 2012, BEA@NAACL-HLT.

[2]  Douglas A. Reynolds,et al.  Measuring human readability of machine generated text: three case studies in speech recognition and machine translation , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[3]  Richard C. Anderson How to Construct Achievement Tests to Assess Comprehension , 1972 .

[4]  Steven A. Stahl,et al.  Children's reading comprehension and assessment , 2005 .

[5]  Elmer V. Bernstam,et al.  Instruments to assess the quality of health information on the World Wide Web: what can our patients actually use? , 2005, Int. J. Medical Informatics.

[6]  John R. Bormuth,et al.  CLOZE TEST READABILITY: CRITERION REFERENCE SCORES , 1968 .

[7]  Panagiotis G. Ipeirotis Demographics of Mechanical Turk , 2010 .

[8]  Daniela B. Friedman,et al.  A Systematic Review of Readability and Comprehension Instruments Used for Print and Web-Based Cancer Information , 2006, Health education & behavior : the official publication of the Society for Public Health Education.

[9]  Lijun Feng,et al.  Cognitively Motivated Features for Readability Assessment , 2009, EACL.

[10]  James Paul Gee Three paradigms in reading (really literacy) research and digital media , 2015 .

[11]  Michael Heilman,et al.  A Selection Strategy to Improve Cloze Question Quality , 2008 .

[12]  Wilson L. Taylor,et al.  Recent Developments in the Use of “Cloze Procedure” , 1956 .

[13]  Hiroshi Nakagawa,et al.  Assisting cloze test making with a web application , 2007 .

[14]  Danielle S. McNamara,et al.  Learning from texts: Effects of prior knowledge and text coherence , 1996 .

[15]  Ani Nenkova,et al.  Automatic Evaluation of Linguistic Quality in Multi-Document Summarization , 2010, ACL.

[16]  L. Cronbach,et al.  Construct validity in psychological tests. , 1955, Psychological bulletin.

[17]  Earl F. Rankin,et al.  Comparable Cloze and Multiple-Choice Comprehension Test Scores. , 1969 .

[18]  Nina Wacholder,et al.  Conceptualizing the Role of Reading and Literacy in Health Information Practices , 2018, iConference.

[19]  Jason S. Chang,et al.  FAST – An Automatic Generation System for Grammar Tests , 2006, ACL.

[20]  Mostafa Zamanian,et al.  Readability of Texts: State of the Art , 2012 .

[21]  P. David Pearson,et al.  Two Steps Forward, Three Steps Back: The Stormy History of Reading Comprehension Assessment , 1998 .

[22]  Noah A. Smith,et al.  Automatic factual question generation from text , 2011 .

[23]  Chaoqun Xie,et al.  Translation Quality Assessment: Past and Present , 2019, Australian Journal of Linguistics.

[24]  Tom Rodden,et al.  Consent for all: revealing the hidden complexity of terms and conditions , 2013, CHI.

[25]  Annie I. Antón,et al.  Financial privacy policies and the need for standardization , 2004, IEEE Security & Privacy Magazine.

[26]  R. Flesch A new readability yardstick. , 1948, The Journal of applied psychology.

[27]  Ani Nenkova,et al.  What Makes Writing Great? First Experiments on Article Quality Prediction in the Science Journalism Domain , 2013, TACL.

[28]  Richard R. Day,et al.  Developing Reading Comprehension Questions. , 2005 .

[29]  H. Abdi Holm's Sequential Bonferroni Procedure , 2010 .

[30]  Martin Pielot,et al.  Make It Big!: The Effect of Font Size and Line Spacing on Online Readability , 2016, CHI.

[31]  Thomas S. Tullis,et al.  A Comparison of Methods for Eliciting Post-Task Subjective Ratings in Usability Testing , 2006 .

[32]  Donna Marie Gates How to Generate Cloze Questions from Definitions: A Syntactic Approach , 2011, AAAI Fall Symposium: Question Generation.

[33]  Tomoko Kojiri,et al.  Automatic Generation System of Multiple-Choice Cloze Questions and its Evaluation , 2010 .

[34]  J. Weijer,et al.  Word length, sentence length and frequency: Zipf revisited , 2004 .

[35]  Kathleen C. Stevens Readability Formulae and McCall-Crabbs Standard Test Lessons in Reading. , 1980 .

[36]  William P. Stevens,et al.  Measuring the Readability of Business Writing: The Cloze Procedure Versus Readability Formulas , 1992 .

[37]  Edgar Dale,et al.  A Study of the Factors Influencing the Difficulty of Reading Materials for Adults of Limited Reading Ability , 1934, The Library Quarterly.

[38]  Joseph S. Dumas,et al.  Comparison of three one-question, post-task usability questionnaires , 2009, CHI.

[39]  Dragomir R. Radev,et al.  How to Analyze Political Attention with Minimal Assumptions and Costs , 2010 .

[40]  Heng Ji,et al.  Automatic Cloze Generation based on Cross-document Information Extraction , 2010 .

[41]  P. David Pearson,et al.  Effective Practices for Developing Reading Comprehension , 2009 .

[42]  Lorrie Faith Cranor,et al.  A Comparative Study of Online Privacy Policies and Formats , 2009, Privacy Enhancing Technologies.

[43]  Ani Nenkova,et al.  Revisiting Readability: A Unified Framework for Predicting Text Quality , 2008, EMNLP.

[44]  Maxine Eskénazi,et al.  Automatic Question Generation for Vocabulary Assessment , 2005, HLT.

[45]  L. T. DeCarlo On the meaning and use of kurtosis. , 1997 .

[46]  John W. Oller,et al.  CLOZE TESTS IN ENGLISH, THAI, AND VIETNAMESE: NATIVE AND NON‐NATIVE PERFORMANCE , 1972 .

[47]  Penny Thompson,et al.  Reading at a Crossroads?: Disjunctures and Continuities in Current Conceptions and Practices , 2015 .

[48]  Colin Potts,et al.  Privacy policies as decision-making tools: an evaluation of online privacy notices , 2004, CHI.

[49]  Qing Zeng-Treitler,et al.  A semantic and syntactic text simplification tool for health content. , 2010, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[50]  A. Acquisti,et al.  Reputation as a sufficient condition for data quality on Amazon Mechanical Turk , 2013, Behavior Research Methods.

[51]  Aniket Kittur,et al.  Crowdsourcing user studies with Mechanical Turk , 2008, CHI.

[52]  Zoran Bursac,et al.  Purposeful selection of variables in logistic regression , 2008, Source Code for Biology and Medicine.

[53]  Stephanie Seneff,et al.  Automatic generation of cloze items for prepositions , 2007, INTERSPEECH.

[54]  Rudolf Franz Flesch Marks of readable style : a study in adult education , 1943 .

[55]  Rob Miller,et al.  Enhancing web page readability for non-native readers , 2010, CHI.

[56]  Albrecht Schmidt,et al.  Utilizing the Effects of Priming to Facilitate Text Comprehension , 2015, CHI Extended Abstracts.

[57]  J. Powell,et al.  Empirical studies assessing the quality of health information for consumers on the world wide web: a systematic review. , 2002, JAMA.

[58]  Wilson L. Taylor,et al.  “Cloze Procedure”: A New Tool for Measuring Readability , 1953 .

[59]  Matthew Richardson,et al.  MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text , 2013, EMNLP.

[60]  Elissa M. Redmiles,et al.  How Well Do My Results Generalize? Comparing Security and Privacy Survey Results from MTurk, Web, and Telephone Samples , 2019, 2019 IEEE Symposium on Security and Privacy (SP).