A readability level prediction tool for K‐12 books

The readability levels of books identify suitable reading materials. Unfortunately, the majority of published books are assigned a readability level range, which is not useful to readers who look for books at a particular grade level. Existing readability formulas/analysis tools require at least an excerpt of a book to estimate its readability level, which is a severe constraint, since copyright laws prohibit book contents from being made publicly accessible. To alleviate the constraint, we have developed TRoLL which relies on publicly accessible online book metadata, in addition to using a book's snippet, if it is available, to predict its readability level. Based on a multi‐dimensional regression analysis, TRoLL determines the grade level of any book instantly, even without a sample of its text, and considers its topical suitability, which is unique. Furthermore, TRoLL is a significant contribution to the educational community, since its computed book readability levels can enrich K‐12 readers' book selections and aid parents, teachers, and librarians in locating reading materials suitable for their K‐12 readers, which can be a time‐consuming and frustrating task that does not always yield a quality outcome. Conducted empirical studies have verified the prediction accuracy of TRoLL and demonstrated its superiority over well‐known readability formulas/analysis tools.

[1]  B. Jean Mandernach,et al.  Journal on Educational Psychology , 2014 .

[2]  Christopher D. Manning,et al.  Generating Typed Dependency Parses from Phrase Structure Parses , 2006, LREC.

[3]  W. A. Sumner,et al.  A recalculation of four adult readability formulas. , 1958 .

[4]  Kate Cain,et al.  The Precursors of Reading Ability in Young Readers: Evidence From a Four-Year Longitudinal Study , 2012 .

[5]  M P Satija Cataloging Correctly for Kids: An Introduction to the Tools , 2013 .

[6]  Yuan Zhao,et al.  Conceptual data model-based software size estimation for information systems , 2009, TSEM.

[7]  G. Spache,et al.  A New Readability Formula for Primary-Grade Reading Materials , 1953, The Elementary School Journal.

[8]  Heidi Anne E. Mesmer Tools for Matching Readers to Texts: Research-Based Practices , 2007 .

[9]  Kumiko Tanaka-Ishii,et al.  Sorting Texts by Readability , 2010, CL.

[10]  John S. Caylor,et al.  Methodologies for Determining Reading Requirements Military Occupational Specialties. , 1973 .

[11]  Jeffrey M. Wooldridge,et al.  Introductory Econometrics: A Modern Approach , 1999 .

[12]  Lijun Feng,et al.  A Comparison of Features for Automatic Readability Assessment , 2010, COLING.

[13]  Sharon Zuiderveld,et al.  Cataloging correctly for kids : an introduction to the tools , 1991 .

[14]  Yi Ma,et al.  Ranking-based readability assessment for early primary children’s literature , 2012, NAACL.

[15]  Yiu-Kai Ng,et al.  ReadAid: A Robust and Fully-Automated Readability Assessment Tool , 2011, 2011 IEEE 23rd International Conference on Tools with Artificial Intelligence.

[16]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[17]  Marion Crowhurst,et al.  AUDIENCE AND MODE OF DISCOURSE EFFECTS ON SYNTACTIC COMPLEXITY IN WRITING AT TWO GRADE LEVELS , 1979 .

[18]  Arthur C. Graesser,et al.  Coh-Metrix: Analysis of text on cohesion and language , 2004, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[19]  Daniela B. Friedman,et al.  A Systematic Review of Readability and Comprehension Instruments Used for Print and Web-Based Cancer Information , 2006, Health education & behavior : the official publication of the Society for Public Health Education.

[20]  J. Begeny,et al.  CAN READABILITY FORMULAS BE USED TO SUCCESSFULLY GAUGE DIFFICULTY OF READING MATERIALS , 2014 .

[21]  Gabriella Kazai,et al.  Social book search: comparing topical relevance judgements and book suggestions for evaluation , 2012, CIKM.

[22]  G. Harry McLaughlin,et al.  SMOG Grading - A New Readability Formula. , 1969 .

[23]  M. Coleman,et al.  A computer readability formula designed for machine scoring. , 1975 .

[24]  Kevyn Collins-Thompson,et al.  A Language Modeling Approach to Predicting Reading Difficulty , 2004, NAACL.

[25]  W. Bruce Croft,et al.  Search Engines - Information Retrieval in Practice , 2009 .

[26]  Rebekah George Benjamin Reconstructing Readability: Recent Developments and Recommendations in the Analysis of Text Difficulty , 2012 .

[27]  J. Chall,et al.  Readability revisited : the new Dale-Chall readability formula , 1995 .

[28]  Maosong Sun,et al.  Monte Carlo Methods for Maximum Margin Supervised Topic Models , 2012, NIPS.

[29]  William H. DuBay The Principles of Readability. , 2004 .

[30]  Sheila S. Intner,et al.  Cataloging Correctly for Kids: An Introduction to the Tools , 2005 .

[31]  R. P. Fishburne,et al.  Derivation of New Readability Formulas (Automated Readability Index, Fog Count and Flesch Reading Ease Formula) for Navy Enlisted Personnel , 1975 .

[32]  William Kodom The Role of Readability in Science Education in Ghana: A Readability Index Analysis of Ghana Association of Science Teachers Textbooks for Senior High School , 2013 .

[33]  Kevyn Collins-Thompson,et al.  An Analysis of Statistical Models and Features for Reading Difficulty Prediction , 2008, ACL 2008.

[34]  Mari Ostendorf,et al.  Reading Level Assessment Using Support Vector Machines and Statistical Language Models , 2005, ACL.

[35]  Robert N. Kantor,et al.  On the Failure of Readability Formulas to Define Readable Texts: A Case Study from Adaptations. , 1982 .

[36]  Yiu-Kai Ng,et al.  Using Word Clusters to Detect Similar Web Documents , 2006, KSEM.

[37]  Dean R. Smith The Lexile Scale in Theory and Practice. Final Report. , 1989 .

[38]  Jonathan Anderson Lix and Rix: Variations on a Little-Known Readability Index. , 1983 .

[39]  R. Gunning The Technique of Clear Writing. , 1968 .

[40]  Xiaotian Chen,et al.  Google Books and WorldCat: a comparison of their content , 2012 .

[41]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[42]  R. Flesch A new readability yardstick. , 1948, The Journal of applied psychology.