Candidate sentence selection for language learning exercises: from a comprehensive framework to an empirical evaluation

We present a framework and its implementation relying on Natural Language Processing methods, which aims at the identification of exercise item candidates from corpora. The hybrid system combining heuristics and machine learning methods includes a number of relevant selection criteria. We focus on two fundamental aspects: linguistic complexity and the dependence of the extracted sentences on their original context. Previous work on exercise generation addressed these two criteria only to a limited extent, and a refined overall candidate sentence selection framework appears also to be lacking. In addition to a detailed description of the system, we present the results of an empirical evaluation conducted with language teachers and learners which indicate the usefulness of the system for educational purposes. We have integrated our system into a freely available online learning platform.

[1]  Arne Jönsson,et al.  Features Indicating Readability in Swedish Text , 2013, NODALIDA.

[2]  Matthew Stone,et al.  Anaphora and Discourse Structure , 2001, CL.

[3]  Andy Cresswell,et al.  Getting to ‘know’ connectors? Evaluating data-driven learning in a writing skills course , 2007 .

[4]  Randi Reppen,et al.  From Corpus to classroom: Language use and language teaching , 2008 .

[5]  Christian Pölitz,et al.  Using a Maximum Entropy Classifier to link “good” corpus examples to dictionary senses , 2015 .

[6]  Jun Ni,et al.  Feature-Based Assessment of Text Readability , 2013, 2013 Seventh International Conference on Internet Computing for Engineering and Science.

[7]  Richard Johansson,et al.  Rule-based and machine learning approaches for second language sentence-level readability , 2014, BEA@ACL.

[8]  Nikola Ljubešić,et al.  Predicting corpus example quality via supervised machine learning , 2015 .

[9]  Walt Detmar Meurers,et al.  On Improving the Accuracy of Readability Classification using Insights from Second Language Acquisition , 2012, BEA@NAACL-HLT.

[10]  Adam Kilgarriff,et al.  GDEX: Automatically Finding Good Dictionary Examples in a Corpus , 2008 .

[11]  Maria Toporowska Gronostaj,et al.  The Rocky Road towards a Swedish FrameNet - Creating SweFN , 2012, LREC.

[12]  Mari Ostendorf,et al.  Reading Level Assessment Using Support Vector Machines and Statistical Language Models , 2005, ACL.

[13]  Walt Detmar Meurers,et al.  Assessing the relative reading level of sentence pairs for text simplification , 2014, EACL.

[14]  E. Gibson Linguistic complexity: locality of syntactic dependencies , 1998, Cognition.

[15]  Thomas M. Segler Investigating the Selection of Example Sentences for Unknown Target Words in ICALL Reading Texts for L2 German , 2007 .

[16]  António Branco,et al.  Rolling out Text Categorization for Language Learning Assessment Supported by Language Technology , 2014, PROPOR.

[17]  Richard Johansson,et al.  Automatic Selection of Suitable Sentences for Language Learning Exercises , 2013 .

[18]  Markus Forsberg,et al.  SALDO: a touch of yin to WordNet’s yang , 2013, Lang. Resour. Evaluation.

[19]  Felice Dell'Orletta,et al.  Assessing the Readability of Sentences: Which Corpora and Features? , 2014, BEA@ACL.

[20]  R. Dekeyser,et al.  Practice in a Second Language: Perspectives from Applied Linguistics and Cognitive Psychology , 2007 .

[21]  Kuo-En Chang,et al.  Leveling L2 Texts Through Readability: Combining Multilevel Linguistic Features with the CEFR , 2015 .

[22]  Kevyn Collins-Thompson,et al.  Computational Assessment of Text Readability: A Survey of Current and Future Research Running title: Computational Assessment of Text Readability , 2014 .

[23]  Stefan Bordag,et al.  A Comparison of Co-occurrence and Similarity Measures as Simulations of Context , 2008, CICLing.

[24]  B. Frey,et al.  Item-writing rules: Collective wisdom , 2005 .

[25]  Eiichiro Sumita,et al.  Measuring Non-native Speakers’ Proficiency of English by Using a Test with Automatically-Generated Fill-in-the-Blank Questions , 2005 .

[26]  Richard Johansson,et al.  Semi-automatic selection of best corpus examples for Swedish: Initial algorithm evaluation , 2012 .

[27]  Bruce Thompson,et al.  Using Microcomputers to Score and Evaluate Items. , 1985 .

[28]  Adam Kilgarriff,et al.  The Sketch Engine , 2004 .

[29]  Ets Rr The Language Muse SM System: Linguistically Focused Instructional Authoring , 2012 .

[30]  L. Barsalou Context-independent and context-dependent information in concepts , 1982, Memory & cognition.

[31]  Cédrick Fairon,et al.  An “AI readability” Formula for French as a Foreign Language , 2012, EMNLP.

[32]  Arthur C. Graesser,et al.  Coh-Metrix , 2011 .

[33]  Ted Briscoe,et al.  Text Readability Assessment for Second Language Learners , 2016, BEA@NAACL-HLT.

[34]  Sofie Johansson Kokkinakis,et al.  Introducing the Swedish Kelly-list, a new lexical e-resource for Swedish , 2012, LREC.

[35]  Oren Melamud,et al.  Bundled Gap Filling: A New Paradigm for Unambiguous Cloze Exercises , 2016, BEA@NAACL-HLT.

[36]  Lothar Lemnitzer,et al.  Automatic example sentence extraction for a contemporary German dictionary , 2012 .

[37]  D. Bernhard,et al.  Recent Advances in Automatic Readability Assessment and Text Simplification , 2014 .

[38]  Christian Pölitz,et al.  Combining a rule-based approach and machine learning in a good-example extraction task for the purpose of lexicographic work on contemporary standard German , 2015 .

[39]  Yi-Ting Huang,et al.  A Robust Estimation Scheme of Reading Difficulty for Second Language Learners , 2011, 2011 IEEE 11th International Conference on Advanced Learning Technologies.

[40]  Iztok Kosem,et al.  GDEX for Slovene , 2011 .

[41]  Elizabeth Salesky,et al.  Exploiting Morphological, Grammatical, and Semantic Correlates for Improved Text Difficulty Assessment , 2014, BEA@ACL.

[42]  Kevyn Collins-Thompson,et al.  A Language Modeling Approach to Predicting Reading Difficulty , 2004, NAACL.

[43]  Linda Bradley,et al.  20 Years of Eurocall: Learning from the Past, Looking to the Future , 2013 .

[44]  Thomas François,et al.  SVALex: a CEFR-graded Lexical Resource for Swedish Foreign and Second Language Learners , 2016, LREC.

[45]  Elena Volodina,et al.  You Get what You Annotate: A Pedagogically Annotated Corpus of Coursebooks for Swedish as a Second Language , 2014 .

[46]  John Lee,et al.  Personalized Exercises for Preposition Learning , 2016, ACL.

[47]  Simonetta Montemagni,et al.  READ–IT: Assessing Readability of Italian Texts with a View to Text Simplification , 2011, SLPAT.

[48]  Elena Volodina,et al.  A Readable Read: Automatic Assessment of Language Learning Materials based on Linguistic Complexity , 2016, Int. J. Comput. Linguistics Appl..

[49]  Katarina Heimann Mühlenbock I see what you mean , 2013 .

[50]  Markus Forsberg,et al.  Korp — the corpus infrastructure of Språkbanken , 2012, LREC.

[51]  Maxine Eskénazi,et al.  Semi-automatic generation of cloze question distractors effect of students' L1 , 2009, SLaTE.

[52]  Ildikó Pilán,et al.  Detecting Context Dependence in Exercise Item Candidates Selected from Corpora , 2016, BEA@NAACL-HLT.

[53]  Maxine Eskénazi,et al.  Combining Lexical and Grammatical Features to Improve Readability Measures for First and Second Language Texts , 2007, NAACL.

[54]  Iryna Gurevych,et al.  Predicting the Difficulty of Language Proficiency Tests , 2014, TACL.

[55]  John Gray,et al.  The Construction of English: Culture, Consumerism and Promotion in the ELT Global Coursebook , 2010 .

[56]  L. Crocker,et al.  Introduction to Classical and Modern Test Theory , 1986 .

[57]  Thea van der Geest,et al.  Online Test Tool to Determine the CEFR Reading Comprehension Level of Text , 2013, DSAI.

[58]  Adam Kilgarriff,et al.  Gap-fill Tests for Language Learners: Corpus-Driven Item Generation , 2010 .

[59]  Le An Ha,et al.  A computer-aided environment for generating multiple-choice test items , 2006, Natural Language Engineering.

[60]  Tom Cobb,et al.  Is There Any Measurable Learning from Hands-On Concordancing?. , 1997 .