Using Automated Item Generation to Promote Principled Test Design and Development

Educational assessment plays an important role in modern society. Teachers use tests to measure students’ strengths and weaknesses and to determine whether students are meeting educational objectives; school administrators use tests to monitor students’ progress, and to place students in the appropriate grade; students are selected by colleges and universities based on their performance on standardized tests; parents are informed about their children’s performance in each subject by means of report cards. The diversity of assessment situations is truly impressive. However, such high demands for educational assessment comes at a significant cost and effort. Hundreds, if not thousands, of test items must be developed to measure student performance. As a result, item development involves significant cost, time, and effort. In the traditional approach to test construction, each item is individually developed by content specialists. In this process, the item is first written, then reviewed, revised, edited, and, finally, it is administered. As a result of this lengthy process, it becomes difficult to meet the ever increasing demand for more test items (Drasgow, Luecht, & Bennett , 2006). Automated Item Generation (AIG) is an alternative approach to item development that can suplement the traditional approach by using specifically programmed algorithms. The goal of AIG is to produce large numbers of high-quality items that require little human review prior to administration (Williamson, Johnson, Sinharay, & Bejar, 2002). The purpose of this study is to describe and illustrate an approach for developing task models that can be used for AIG with the College Board’s Advanced Placement Program (AP). The College Board supports major programs and services that promote college admissions, guidance, assessment, financial aid, enrollment, teaching, and learning (College