Text Readability Assessment for Second Language Learners

This paper addresses the task of readability assessment for the texts aimed at second language (L2) learners. One of the major challenges in this task is the lack of significantly sized level-annotated data. For the present work, we collected a dataset of CEFR-graded texts tailored for learners of English as an L2 and investigated text readability assessment for both native and L2 learners. We applied a generalization method to adapt models trained on larger native corpora to estimate text readability for learners, and explored domain adaptation and self-learning techniques to make use of the native data to improve system performance on the limited L2 data. In our experiments, the best performing model for readability on learner texts achieves an accuracy of 0.797 and PCC of $0.938$.

[1]  Kevyn Collins-Thompson,et al.  An Analysis of Statistical Models and Features for Reading Difficulty Prediction , 2008, ACL 2008.

[2]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[3]  Ani Nenkova,et al.  Revisiting Readability: A Unified Framework for Predicting Text Quality , 2008, EMNLP.

[4]  Rohit J. Kate,et al.  Learning to Predict Readability using Diverse Linguistic Features , 2010, COLING.

[5]  Maxine Eskénazi,et al.  Combining Lexical and Grammatical Features to Improve Readability Measures for First and Second Language Texts , 2007, NAACL.

[6]  Helen Yannakoudakis,et al.  Automated assessment of English-learner writing , 2013 .

[7]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[8]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[9]  Jack Gilliland,et al.  The concept of readability , 1968 .

[10]  Luo Si,et al.  A statistical model for scientific readability , 2001, CIKM '01.

[11]  M. Coleman,et al.  A computer readability formula designed for machine scoring. , 1975 .

[12]  Kevyn Collins-Thompson,et al.  A Language Modeling Approach to Predicting Reading Difficulty , 2004, NAACL.

[13]  Hwee Tou Ng,et al.  Flexible Domain Adaptation for Automated Essay Scoring Using Correlated Linear Regression , 2015, EMNLP.

[14]  Kathleen McKeown,et al.  Improving Word Sense Disambiguation in Lexical Chaining , 2003, IJCAI.

[15]  Lijun Feng,et al.  A Comparison of Features for Automatic Readability Assessment , 2010, COLING.

[16]  Cédrick Fairon,et al.  An “AI readability” Formula for French as a Foreign Language , 2012, EMNLP.

[17]  Averil Coxhead A New Academic Word List , 2000 .

[18]  Elena Volodina,et al.  A Readable Read: Automatic Assessment of Language Learning Materials based on Linguistic Complexity , 2016, Int. J. Comput. Linguistics Appl..

[19]  Walt Detmar Meurers,et al.  Readability assessment for text simplification: From analysing documents to identifying sentential simplifications , 2014 .

[20]  Mirella Lapata,et al.  Modeling Local Coherence: An Entity-Based Approach , 2005, ACL.

[21]  Elizabeth Salesky,et al.  A Language-Independent Approach to Automatic Text Difficulty Assessment for Second-Language Learners , 2013, PITR@ACL.

[22]  Lijun Feng,et al.  Cognitively Motivated Features for Readability Assessment , 2009, EACL.

[23]  S. Lewis,et al.  Regression analysis , 2007, Practical Neurology.

[24]  Xiaofei Lu A Corpus-Based Evaluation of Syntactic Complexity Measures as Indices of College-Level ESL Writers' Language Development , 2011 .

[25]  Hal Daumé,et al.  Frustratingly Easy Domain Adaptation , 2007, ACL.

[26]  Ted Briscoe,et al.  The Second Release of the RASP System , 2006, ACL.

[27]  Kevyn Collins-Thompson,et al.  Computational Assessment of Text Readability: A Survey of Current and Future Research Running title: Computational Assessment of Text Readability , 2014 .

[28]  A. Capel Completing the English Vocabulary Profile : C1 and C2 vocabulary , 2012 .

[29]  Micha Elsner,et al.  Extending the Entity Grid with Entity-Specific Features , 2011, ACL.

[30]  Mari Ostendorf,et al.  Reading Level Assessment Using Support Vector Machines and Statistical Language Models , 2005, ACL.

[31]  R. P. Fishburne,et al.  Derivation of New Readability Formulas (Automated Readability Index, Fog Count and Flesch Reading Ease Formula) for Navy Enlisted Personnel , 1975 .

[32]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[33]  吉島 茂,et al.  文化と言語の多様性の中のCommon European Framework of Reference for Languages: Learning, teaching, assessment (CEFR)--それは基準か? (第10回明海大学大学院応用言語学研究科セミナー 講演) , 2008 .

[34]  Francisco Costa,et al.  Assessing automatic text classification for interactive language learning , 2014, International Conference on Information Society (i-Society 2014).

[35]  Walt Detmar Meurers,et al.  On Improving the Accuracy of Readability Classification using Insights from Second Language Acquisition , 2012, BEA@NAACL-HLT.