Improving Access to Digital Library Resources by Automatically Generating Complete Reading Level Metadata

Digital library collections usually hold resources describing a limited set of topics spanning a wide range of reading levels, requiring complete reading level metadata to filter relevant resources from the collection. In order to suggest the reading level for all resources in the test collection, we propose an SVM-based classification tool which predicts the specific reading level with an F-Measure of 0.70 for all resources, outperforming other classification methods and readability formulas under evaluation. To measure the impact of reading level metadata completeness on retrieval performance, a knowledge based system retrieves documents from three collections containing different reading level completeness: one with complete reading level information generated by the proposed SVM method, one missing all reading level information, and the final collection containing limited, human-expert provided metadata. The dataset with automatically identified complete reading level exceeds the performance of collection-provided reading level metadata for all five sample tasks.

[1]  Il Im,et al.  Search Personalization: Knowledge-Based Recommendation in Digital Libraries , 2009, AMCIS.

[2]  Susan T. Dumais,et al.  Inductive learning algorithms and representations for text categorization , 1998, CIKM '98.

[3]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[4]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[5]  Kun Hua Tsai,et al.  A Learning Objects Recommendation Model based on the Preference and Ontological Approaches , 2006 .

[6]  Kevyn Collins-Thompson,et al.  Predicting reading difficulty with statistical language models , 2005, J. Assoc. Inf. Sci. Technol..

[7]  Thorsten Joachims,et al.  Text categorization with support vector machines , 1999 .

[8]  Sean M. McNee,et al.  On the recommending of citations for research papers , 2002, CSCW '02.

[9]  J. Chall,et al.  Readability revisited : the new Dale-Chall readability formula , 1995 .

[10]  R. P. Fishburne,et al.  Derivation of New Readability Formulas (Automated Readability Index, Fog Count and Flesch Reading Ease Formula) for Navy Enlisted Personnel , 1975 .

[11]  Yiming Yang,et al.  A re-examination of text categorization methods , 1999, SIGIR '99.

[12]  Florence d'Alché-Buc,et al.  Support Vector Machines based on a semantic kernel for text categorization , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.

[13]  E. B. Page Computer Grading of Student Prose, Using Modern Concepts and Software , 1994 .