论文信息 - Using Automated Item Generation to Promote Principled Test Design and Development

Using Automated Item Generation to Promote Principled Test Design and Development

Educational assessment plays an important role in modern society. Teachers use tests to measure students’ strengths and weaknesses and to determine whether students are meeting educational objectives; school administrators use tests to monitor students’ progress, and to place students in the appropriate grade; students are selected by colleges and universities based on their performance on standardized tests; parents are informed about their children’s performance in each subject by means of report cards. The diversity of assessment situations is truly impressive. However, such high demands for educational assessment comes at a significant cost and effort. Hundreds, if not thousands, of test items must be developed to measure student performance. As a result, item development involves significant cost, time, and effort. In the traditional approach to test construction, each item is individually developed by content specialists. In this process, the item is first written, then reviewed, revised, edited, and, finally, it is administered. As a result of this lengthy process, it becomes difficult to meet the ever increasing demand for more test items (Drasgow, Luecht, & Bennett , 2006). Automated Item Generation (AIG) is an alternative approach to item development that can suplement the traditional approach by using specifically programmed algorithms. The goal of AIG is to produce large numbers of high-quality items that require little human review prior to administration (Williamson, Johnson, Sinharay, & Bejar, 2002). The purpose of this study is to describe and illustrate an approach for developing task models that can be used for AIG with the College Board’s Advanced Placement Program (AP). The College Board supports major programs and services that promote college admissions, guidance, assessment, financial aid, enrollment, teaching, and learning (College

Mark J. Gierl | Hollis Lai | Cecilia Alves

[1] Robert J. Mislevy,et al. Automated scoring of complex tasks in computer-based testing , 2006 .

[2] Matthew S. Johnson,et al. CALIBRATION OF POLYTOMOUS ITEM FAMILIES USING BAYESIAN HIERARCHICAL MODELING , 2003 .

[3] T. Haladyna. Developing and Validating Multiple-Choice Test Items , 1994 .

[4] Wayne J. Camara,et al. Choosing students : higher education admissions tools for the 21st century , 2005 .

[5] James W Pellegrino,et al. Technology and Testing , 2009, Science.

[6] Jiawen Zhou. A Review of Assessment Engineering Principles with Select Applications to the Certified Public Accountant Examination Technical Report , 2010 .

[7] Russell G. Almond,et al. On the Roles of Task Model Variables in Assessment Design. , 1999 .

[8] David M. Williamson,et al. Hierarchical IRT Examination of Isomorphic Equivalence of Complex Constructed Response Tasks. , 2002 .

[9] Mark J. Gierl,et al. Developing a Taxonomy of Item Model Types to Promote Assessment Engineering , 2008 .

[10] R. Almond,et al. A BRIEF INTRODUCTION TO EVIDENCE-CENTERED DESIGN , 2003 .