Towards Automatic Scoring of Cloze Items by Selecting Low-Ambiguity Contexts

In second language learning, cloze tests (also known as fill-in-the-blank tests) are frequently used for assessing the learning progress of students. While preparation effort for these tests is low, scoring needs to be done manually, as there usually is a huge number of correct solutions. In this paper, we examine whether the ambiguity of cloze items can be lowered to a point where automatic scoring becomes possible. We utilize the local context of a word to collect evidence of low-ambiguity. We do that by seeking for collocated word sequences, but also taking structural information on sentence level into account. We evaluate the effectiveness of our method in a user study on cloze items ranked by our method. For the top-ranked items (lowest ambiguity) the subjects provide the target word significantly more often than for the bottom-ranked items (59.9% vs. 36.5%). While this shows the potential of our method, we did not succeed in fully eliminating ambiguity. Thus, further research is necessary before fully automatic scoring becomes possible.

[1]  Treebank Penn,et al.  Linguistic Data Consortium , 1999 .

[2]  Adam Kilgarriff,et al.  Gap-fill Tests for Language Learners: Corpus-Driven Item Generation , 2010 .

[3]  Ioannis Korkontzelos,et al.  Reviewing and Evaluating Automatic Term Recognition Techniques , 2008, GoTAL.

[4]  JM O'Toole,et al.  The deceptive mean: Conceptual scoring of cloze entries differentially advantages more able readers , 2011 .

[5]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[6]  Eiichiro Sumita,et al.  Measuring Non-native Speakers’ Proficiency of English by Using a Test with Automatically-Generated Fill-in-the-Blank Questions , 2005 .

[7]  Christian Biemann,et al.  Text: now in 2D! A framework for lexical expansion with contextual similarity , 2013, J. Lang. Model..

[8]  Mamoru Komachi,et al.  Discriminative Approach to Fill-in-the-Blank Quiz Generation for Language Learners , 2013, ACL.

[9]  Wilson L. Taylor,et al.  Recent Developments in the Use of “Cloze Procedure” , 1956 .

[10]  Wilson L. Taylor,et al.  “Cloze Procedure”: A New Tool for Measuring Readability , 1953 .

[11]  Jianhua Li,et al.  Semantic knowledge in word completion , 2005, Assets '05.

[12]  Keith Trnka Adaptive Language Modeling for Word Prediction , 2008, ACL.

[13]  Yoav Goldberg,et al.  A Dataset of Syntactic-Ngrams over Time from a Very Large Corpus of English Books , 2013, *SEMEVAL.

[14]  R. P. Fishburne,et al.  Derivation of New Readability Formulas (Automated Readability Index, Fog Count and Flesch Reading Ease Formula) for Navy Enlisted Personnel , 1975 .

[15]  Iryna Gurevych,et al.  Mining Multiword Terms from Wikipedia , 2012 .

[16]  Stanley F. Chen,et al.  Evaluation Metrics For Language Models , 1998 .

[17]  Victoria Arranz,et al.  Multiwords and Word Sense Disambiguation , 2005, CICLing.

[18]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[19]  Oren Melamud,et al.  Automatic Generation of Challenging Distractors Using Context-Sensitive Inference Rules , 2014, BEA@ACL.

[20]  Stephanie Seneff,et al.  Automatic generation of cloze items for prepositions , 2007, INTERSPEECH.

[21]  Silvia Bernardini,et al.  The WaCky wide web: a collection of very large linguistically processed web-crawled corpora , 2009, Lang. Resour. Evaluation.