Evaluation of automatically generated English vocabulary questions

This paper describes details of the evaluation experiments for questions created by an automatic question generation system. Given a target word and one of its word senses, the system generates a multiple-choice English vocabulary question asking for the closest in meaning to the target word in the reading passage. Two kinds of evaluation were conducted considering two aspects: (1) measuring English learners’ proficiency and (2) their similarity to the human-made questions. The first evaluation is based on the responses from English learners obtained through administering the machine-generated and human-made questions to them, and the second is based on the subjective judgement by English teachers. Both evaluations showed that the machine-generated questions were able to achieve a comparable level with the human-made questions in both measuring English proficiency and similarity.

[1]  Pamela Sharpe Barron's TOEFL iBT Internet-Based Test 2006-2007 12th Edition with CD-ROM , 2006 .

[2]  Kurt VanLehn,et al.  How do machine-generated questions compare to human-generated questions? , 2016, Research and Practice in Technology Enhanced Learning.

[3]  Eiichiro Sumita,et al.  Measuring Non-native Speakers’ Proficiency of English by Using a Test with Automatically-Generated Fill-in-the-Blank Questions , 2005 .

[4]  J. B. Heaton,et al.  Writing English Language Tests , 1988 .

[5]  Jolene Gear and Robert Gear,et al.  Cambridge Preparation for the TOEFL Test , 1993 .

[6]  Mamoru Komachi,et al.  Discriminative Approach to Fill-in-the-Blank Quiz Generation for Language Learners , 2013, ACL.

[7]  Glenn Fulcher,et al.  The Routledge Handbook of Language Testing , 2013 .

[8]  Pamela J. Sharpe Barron's TOEFL iBT : internet-based test , 2010 .

[9]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[10]  Stephanie Seneff,et al.  Automatic generation of cloze items for prepositions , 2007, INTERSPEECH.

[11]  R. Devellis Classical Test Theory , 2006, Medical care.

[12]  Takenobu Tokunaga,et al.  Automatic Generation of English Vocabulary Tests , 2015, CSEDU.

[13]  Diana McCarthy Word Sense Disambiguation: An Overview , 2009, Lang. Linguistics Compass.

[14]  Martha Palmer,et al.  Verb Semantics and Lexical Selection , 1994, ACL.

[15]  星野 綾子,et al.  Automatic question generation for language testing and its evaluation criteria , 2009 .

[16]  W. Hays Principles of Educational and Psychological Testing. 3rd ed. , 1984 .

[17]  A. M. Turing,et al.  Computing Machinery and Intelligence , 1950, The Philosophy of Artificial Intelligence.

[18]  Yukari Yamakawa,et al.  Generating Questions and Multiple-Choice Answers using Semantic Analysis of Texts , 2016, COLING.

[19]  F. G. Brown,et al.  Principles of educational and psychological testing , 1970 .

[20]  Deborah Phillips Longman Preparation Course for the TOEFL Test: iBT Reading , 1989 .

[21]  Ted Pedersen,et al.  WordNet::Similarity - Measuring the Relatedness of Concepts , 2004, NAACL.

[22]  T. L. Kelley The selection of upper and lower groups for the validation of test items. , 1939 .

[23]  Debrorah Phillips,et al.  Longman Preparation Course for the TOEFL Test , 1988 .

[24]  Maxine Eskénazi,et al.  Automatic Question Generation for Vocabulary Assessment , 2005, HLT.

[25]  美国教育考试服务中心 新托福考试官方指南 = The official guide to the new TOEFL iBT , 2008 .

[26]  Sadid A. Hasan,et al.  Towards Topic-to-Question Generation , 2015, CL.

[27]  Oren Melamud,et al.  Automatic Generation of Challenging Distractors Using Context-Sensitive Inference Rules , 2014, BEA@ACL.