Automated Text Readability Assessment for German Language: A Quality of Experience Approach

Data-driven approaches towards readability assessment, using automated linguistic analysis and machine learning methods, is a viable road forward for readability rankings. This paper describes the development of an automated readability assessment estimator based on supervised learning algorithms over German text corpora. For this purpose, natural language processing tools are used to extract 73 linguistic features grouped in traditional, lexical and morphological features. Feature engineering approaches are employed to select informative features. Different supervised learning models are implemented, with the top-ranked features fed as input. The results obtained depict that Random Forest Regressor yielding best result (0.847) for RMSE measure.

[1]  Jack Gilliland,et al.  The concept of readability , 1968 .

[2]  Lucia Specia,et al.  Readability Assessment for Text Simplification , 2010 .

[3]  Luo Si,et al.  A statistical model for scientific readability , 2001, CIKM '01.

[4]  Mari Ostendorf,et al.  A machine learning approach to reading level assessment , 2009, Comput. Speech Lang..

[5]  J. Chall,et al.  Readability revisited : the new Dale-Chall readability formula , 1995 .

[6]  Volker Gast,et al.  Understanding English-German Contrasts , 2007 .

[7]  Gerold Schneider,et al.  Exploiting Synergies Between Open Resources for German Dependency Parsing, POS-tagging, and Morphological Analysis , 2013, RANLP.

[8]  Sebastian Möller,et al.  Subjective Assessment of Text Complexity: A Dataset for German Language , 2019, ArXiv.

[9]  Ani Nenkova,et al.  Revisiting Readability: A Unified Framework for Predicting Text Quality , 2008, EMNLP.

[10]  Kevyn Collins-Thompson,et al.  A Language Modeling Approach to Predicting Reading Difficulty , 2004, NAACL.

[11]  Maxine Eskénazi,et al.  Combining Lexical and Grammatical Features to Improve Readability Measures for First and Second Language Texts , 2007, NAACL.

[12]  Lijun Feng,et al.  Automatic Readability Assessment , 2010 .

[13]  William H. DuBay The Classic Readability Studies. , 2007 .

[14]  R. P. Fishburne,et al.  Derivation of New Readability Formulas (Automated Readability Index, Fog Count and Flesch Reading Ease Formula) for Navy Enlisted Personnel , 1975 .