Information-based methods for evaluating the semantics of automatically generated test items

Multiple-choice questions are the popular type of test items that are used for testing the knowledge of health-science students innorth America and elsewhere. The motivation of this article is to present the recent advances in the automatic item generation(AIG) and to propose a novel unsupervised approach that extends the information-based Compositional Distributional SemanticModel (CDSM) to measure the semantic relatedness among the pool of automatically generated items. We have used operationalitem bank from the medical science domain for developing the CDSM and demonstrated our approach using the concepts fromAIG research. We illustrated our approach using eleven item models from the medical education domain, and discussed thepossible applications to advance the AIG research.

[1]  Marco Marelli,et al.  A SICK cure for the evaluation of compositional distributional semantic models , 2014, LREC.

[2]  Jacob Cohen Statistical Power Analysis for the Behavioral Sciences , 1969, The SAGE Encyclopedia of Research Design.

[3]  Chien Chou Constructing a computer-assisted testing and evaluation system on the World Wide Web-the CATES experience , 2000, IEEE Trans. Educ..

[4]  Mark J. Gierl,et al.  Generating Items Under the Assessment Engineering Framework , 2012 .

[5]  Din J. Wasem,et al.  Mining of Massive Datasets , 2014 .

[6]  Thomas M. Haladyna,et al.  Using Weak and Strong Theory to Create Item Models for Automatic Item Generation: Some Practical Guidelines with Examples , 2012 .

[7]  Le An Ha,et al.  Semantic Similarity of Distractors in Multiple-Choice Tests: Extrinsic Evaluation , 2009 .

[8]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[9]  Willem J. van der Linden,et al.  Linear Models for Optimal Test Design , 2005 .

[10]  Morris De Beer,et al.  Technology and Testing , 2013 .

[11]  Isaac I. Bejar GENERATIVE RESPONSE MODELING: LEVERAGING THE COMPUTER AS A TEST DELIVERY MEDIUM , 1996 .

[12]  Mehran Sahami,et al.  Evaluating similarity measures: a large-scale study in the orkut social network , 2005, KDD '05.

[13]  Christopher D. Manning,et al.  Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks , 2015, ACL.

[14]  S. Sloman,et al.  Similarity as an explanatory construct , 1998, Cognition.

[15]  M. Oliveri,et al.  The Learning Sciences in Educational Assessment: The Role of Cognitive Models , 2011, Alberta Journal of Educational Research.

[16]  Norvin Richards,et al.  Two Components of Long-Distance Extraction: Successive Cyclicity in Dinka , 2015, Linguistic Inquiry.

[17]  David M. Williamson,et al.  Calibrating Item Families and Summarizing the Results Using Family Expected Response Functions , 2003 .

[18]  M. Marelli,et al.  SemEval-2014 Task 1: Evaluation of Compositional Distributional Semantic Models on Full Sentences through Semantic Relatedness and Textual Entailment , 2014, *SEMEVAL.

[19]  Mark J. Gierl,et al.  Developing a Taxonomy of Item Model Types to Promote Assessment Engineering , 2008 .

[20]  R. Bennett The Changing Nature of Educational Assessment , 2015 .

[21]  Jure Leskovec,et al.  Mining of Massive Datasets, 2nd Ed , 2014 .

[22]  Curt Burgess,et al.  Producing high-dimensional semantic spaces from lexical co-occurrence , 1996 .

[23]  Bijan Parsia,et al.  A similarity-based theory of controlling MCQ difficulty , 2013, 2013 Second International Conference on E-Learning and E-Technologies in Education (ICEEE).

[24]  Gwo-Jen Hwang,et al.  A Particle Swarm Optimization Approach to Composing Serial Test Sheets for Multiple Assessment Criteria , 2006, J. Educ. Technol. Soc..

[25]  Mirella Lapata,et al.  Vector-based Models of Semantic Composition , 2008, ACL.

[26]  Peter W. Foltz,et al.  The Measurement of Textual Coherence with Latent Semantic Analysis. , 1998 .

[27]  J. Henderson,et al.  Semantic facilitation of lexical access during sentence processing. , 1989, Journal of experimental psychology. Learning, memory, and cognition.

[28]  Gwo-Jen Hwang,et al.  An innovative parallel test sheet composition approach to meet multiple assessment criteria for national tests , 2008, Comput. Educ..

[29]  Francis R. Bach,et al.  Adaptivity of averaged stochastic gradient descent to local strong convexity for logistic regression , 2013, J. Mach. Learn. Res..

[30]  Isaac I. Bejar,et al.  A FEASIBILITY STUDY OF ON‐THE‐FLY ITEM GENERATION IN ADAPTIVE TESTING , 2002 .

[31]  Mirella Lapata,et al.  Composition in Distributional Models of Semantics , 2010, Cogn. Sci..

[32]  Descriptors Achievement Paper resented at the Annual Meeting of the National Council on Measurement in Education , 1986 .

[33]  T. Landauer,et al.  A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .

[34]  Mark J. Gierl,et al.  Using Automated Procedures to Generate Test Items That Measure Junior High Science Achievement , 2016 .

[35]  Mark J. Gierl,et al.  Methods for Creating and Evaluating the Item Model Structure Used In Automatic Item Generation , 2012 .

[36]  Roberto J. Bayardo,et al.  Scaling up all pairs similarity search , 2007, WWW '07.

[37]  Marco Baroni,et al.  Nouns are Vectors, Adjectives are Matrices: Representing Adjective-Noun Constructions in Semantic Space , 2010, EMNLP.