论文信息 - Candidate sentence selection for language learning exercises: from a comprehensive framework to an empirical evaluation

Candidate sentence selection for language learning exercises: from a comprehensive framework to an empirical evaluation

We present a framework and its implementation relying on Natural Language Processing methods, which aims at the identification of exercise item candidates from corpora. The hybrid system combining heuristics and machine learning methods includes a number of relevant selection criteria. We focus on two fundamental aspects: linguistic complexity and the dependence of the extracted sentences on their original context. Previous work on exercise generation addressed these two criteria only to a limited extent, and a refined overall candidate sentence selection framework appears also to be lacking. In addition to a detailed description of the system, we present the results of an empirical evaluation conducted with language teachers and learners which indicate the usefulness of the system for educational purposes. We have integrated our system into a freely available online learning platform.

[1] Arne Jönsson,et al. Features Indicating Readability in Swedish Text , 2013, NODALIDA.

[2] Matthew Stone,et al. Anaphora and Discourse Structure , 2001, CL.

[3] Andy Cresswell,et al. Getting to ‘know’ connectors? Evaluating data-driven learning in a writing skills course , 2007 .

[4] Randi Reppen,et al. From Corpus to classroom: Language use and language teaching , 2008 .

[5] Christian Pölitz,et al. Using a Maximum Entropy Classifier to link “good” corpus examples to dictionary senses , 2015 .

[6] Jun Ni,et al. Feature-Based Assessment of Text Readability , 2013, 2013 Seventh International Conference on Internet Computing for Engineering and Science.

[7] Richard Johansson,et al. Rule-based and machine learning approaches for second language sentence-level readability , 2014, BEA@ACL.

[8] Nikola Ljubešić,et al. Predicting corpus example quality via supervised machine learning , 2015 .

[9] Walt Detmar Meurers,et al. On Improving the Accuracy of Readability Classification using Insights from Second Language Acquisition , 2012, BEA@NAACL-HLT.

[10] Adam Kilgarriff,et al. GDEX: Automatically Finding Good Dictionary Examples in a Corpus , 2008 .

[11] Maria Toporowska Gronostaj,et al. The Rocky Road towards a Swedish FrameNet - Creating SweFN , 2012, LREC.

[12] Mari Ostendorf,et al. Reading Level Assessment Using Support Vector Machines and Statistical Language Models , 2005, ACL.

[13] Walt Detmar Meurers,et al. Assessing the relative reading level of sentence pairs for text simplification , 2014, EACL.

[14] E. Gibson. Linguistic complexity: locality of syntactic dependencies , 1998, Cognition.

[15] Thomas M. Segler. Investigating the Selection of Example Sentences for Unknown Target Words in ICALL Reading Texts for L2 German , 2007 .

[16] António Branco,et al. Rolling out Text Categorization for Language Learning Assessment Supported by Language Technology , 2014, PROPOR.

[17] Richard Johansson,et al. Automatic Selection of Suitable Sentences for Language Learning Exercises , 2013 .

[18] Markus Forsberg,et al. SALDO: a touch of yin to WordNet’s yang , 2013, Lang. Resour. Evaluation.