ReadAid: A Robust and Fully-Automated Readability Assessment Tool

Reading is an integral part of educational development, however, it is frustrating for people who struggle to understand (are not motivated to read, respectively) text documents that are beyond (below, respectively) their readability levels. Finding appropriate reading materials, with or without first scanning through their contents, is a challenge, since there are tremendous amount of documents these days and a clear majority of them are not tagged with their readability levels. Even though existing readability assessment tools determine readability levels of text documents, they analyze solely the lexical, syntactic, and/or semantic properties of a document, which are neither fully-automated, generalized, nor well-defined and are mostly based on observations. To advance the current readability analysis technique, we propose a robust, fully-automated readability analyzer, denoted ReadAid, which employs support vector machines to combine features from the US Curriculum and College Board, traditional readability measures, and the author(s) and subject area(s) of a text document d to assess the readability level of d. ReadAid can be applied for (i) filtering documents (retrieved in response to a web query) of a particular readability level, (ii) determining the readability levels of digitalized text documents, such as book chapters, magazine articles, and news stories, or (iii) dynamically analyzing, in real time, the grade level of a text document being created. The novelty of ReadAid lies on using authorship, subject areas, and academic concepts and grammatical constructions extracted from the US Curriculum to determine the readability level of a text document. Experimental results show that ReadAid is highly effective and outperforms existing state-of-the-art readability assessment tools.

[1]  Jonathan Anderson Lix and Rix: Variations on a Little-Known Readability Index. , 1983 .

[2]  Max Welling,et al.  Fast collapsed gibbs sampling for latent dirichlet allocation , 2008, KDD.

[3]  G. Spache,et al.  A New Readability Formula for Primary-Grade Reading Materials , 1953, The Elementary School Journal.

[4]  Rohit J. Kate,et al.  Learning to Predict Readability using Diverse Linguistic Features , 2010, COLING.

[5]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[6]  Dominic Mazzoni,et al.  Multiclass reduced-set support vector machines , 2006, ICML.

[7]  M. Coleman,et al.  A computer readability formula designed for machine scoring. , 1975 .

[8]  Thomas Gottron,et al.  Estimating web site readability using content extraction , 2009, WWW '09.

[9]  Mari Ostendorf,et al.  A machine learning approach to reading level assessment , 2009, Comput. Speech Lang..

[10]  Irwin King,et al.  Bilingual web page and site readability assessment , 2006, WWW '06.

[11]  Yiu-Kai Ng,et al.  Using Word Clusters to Detect Similar Web Documents , 2006, KSEM.

[12]  Tapas Kanungo,et al.  Predicting the readability of short web summaries , 2009, WSDM '09.

[13]  Adam Jatowt,et al.  Easiest-first search: towards comprehension-based web search , 2009, CIKM.

[14]  Eleni Miltsakaki,et al.  Matching Readers’ Preferences and Reading Skills with Appropriate Web Texts , 2009, EACL.

[15]  R. P. Fishburne,et al.  Derivation of New Readability Formulas (Automated Readability Index, Fog Count and Flesch Reading Ease Formula) for Navy Enlisted Personnel , 1975 .

[16]  Yi Liu,et al.  One-against-all multi-class SVM classification using reliability measures , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[17]  Ani Nenkova,et al.  Revisiting Readability: A Unified Framework for Predicting Text Quality , 2008, EMNLP.

[18]  Tom Minka,et al.  Expectation-Propogation for the Generative Aspect Model , 2002, UAI.

[19]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[20]  D. Sculley,et al.  Relaxed online SVMs for spam filtering , 2007, SIGIR.