Rule-based and machine learning approaches for second language sentence-level readability

We present approaches for the identification of sentences understandable by second language learners of Swedish, which can be used in automatically generated exercises based on corpora. In this work we merged methods and knowledge from machine learning-based readability research, from rule-based studies of Good Dictionary Examples and from second language learning syllabuses. The proposed selection methods have also been implemented as a module in a free web-based language learning platform. Users can use different parameters and linguistic filters to personalize their sentence search with or without a machine learning component assessing readability. The sentences selected have already found practical use as multiple-choice exercise items within the same platform. Out of a number of deep linguistic indicators explored, we found mainly lexical-morphological and semantic features informative for second language sentence-level readability. We obtained a readability classification accuracy result of 71%, which approaches the performance of other models used in similar tasks. Furthermore, during an empirical evaluation with teachers and students, about seven out of ten sentences selected were considered understandable, the rulebased approach slightly outperforming the method incorporating the machine learning model.

[1]  Markus Forsberg,et al.  Korp — the corpus infrastructure of Språkbanken , 2012, LREC.

[2]  Muhammad Zubair Shafiq,et al.  Guidelines to Select Machine Learning Scheme for Classification of Biomedical Datasets , 2009, EvoBIO.

[3]  Elizabeth Salesky,et al.  A Language-Independent Approach to Automatic Text Difficulty Assessment for Second-Language Learners , 2013, PITR@ACL.

[4]  Mari Ostendorf,et al.  Reading Level Assessment Using Support Vector Machines and Statistical Language Models , 2005, ACL.

[5]  Maxine Eskénazi,et al.  Combining Lexical and Grammatical Features to Improve Readability Measures for First and Second Language Texts , 2007, NAACL.

[6]  Richard Johansson,et al.  Automatic Selection of Suitable Sentences for Language Learning Exercises , 2013 .

[7]  Simonetta Montemagni,et al.  READ–IT: Assessing Readability of Italian Texts with a View to Text Simplification , 2011, SLPAT.

[8]  Kevyn Collins-Thompson,et al.  A Language Modeling Approach to Predicting Reading Difficulty , 2004, NAACL.

[9]  Iryna Gurevych,et al.  Towards Fine-Grained Readability Measures for Self-Directed Language Learning , 2012 .

[10]  Markus Forsberg,et al.  SALDO: a touch of yin to WordNet’s yang , 2013, Lang. Resour. Evaluation.

[11]  Katarina Heimann Mühlenbock I see what you mean , 2013 .

[12]  Arthur C. Graesser,et al.  Coh-Metrix , 2011 .

[13]  Sofie Johansson Kokkinakis,et al.  Towards a gold standard for Swedish CEFR-based ICALL , 2013 .

[14]  Thomas M. Segler Investigating the Selection of Example Sentences for Unknown Target Words in ICALL Reading Texts for L2 German , 2007 .

[15]  吉島 茂,et al.  文化と言語の多様性の中のCommon European Framework of Reference for Languages: Learning, teaching, assessment (CEFR)--それは基準か? (第10回明海大学大学院応用言語学研究科セミナー 講演) , 2008 .

[16]  Walt Detmar Meurers,et al.  On Improving the Accuracy of Readability Classification using Insights from Second Language Acquisition , 2012, BEA@NAACL-HLT.

[17]  Adam Kilgarriff,et al.  GDEX: Automatically Finding Good Dictionary Examples in a Corpus , 2008 .

[18]  Cédrick Fairon,et al.  An “AI readability” Formula for French as a Foreign Language , 2012, EMNLP.

[19]  D. McNamara,et al.  Assessing Text Readability Using Cognitively Based Indices , 2008 .

[20]  Sofie Johansson Kokkinakis,et al.  Introducing the Swedish Kelly-list, a new lexical e-resource for Swedish , 2012, LREC.

[21]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[22]  Richard Johansson,et al.  Semi-automatic selection of best corpus examples for Swedish: Initial algorithm evaluation , 2012 .

[23]  Arne Jönsson,et al.  Features Indicating Readability in Swedish Text , 2013, NODALIDA.

[24]  Walt Detmar Meurers,et al.  Assessing the relative reading level of sentence pairs for text simplification , 2014, EACL.