Better Data From Better Measurements Using Computerized Adaptive Testing

The process of constructing a fixed-length conventional test frequently focuses on maximizing internal consistency reliability by selecting test items that are of average difficulty and high discrimination (a “peaked” test). The effect of constructing such a test, when viewed from the perspective of item response theory, is test scores that are precise for examinees whose trait levels are near the point at which the test is peaked; as examinee trait levels deviate from the mean, the precision of their scores decreases substantially. Results of a small simulation study demonstrate that when peaked tests are “off target” for an examinee, their scores are biased and have spuriously high standard deviations, reflecting substantial amounts of error. These errors can reduce the correlations of these kinds of scores with other variables and adversely affect the results of standard statistical tests. By contrast, scores from adaptive tests are essentially unbiased and have standard deviations that are much closer to true values. Basic concepts of adaptive testing are introduced and fully adaptive computerized tests (CATs) based on IRT are described. Several examples of response records from CATs are discussed to illustrate how CATs function. Some operational issues, including item exposure, content balancing, and enemy items are also briefly discussed. It is concluded that because CAT constructs a unique test for examinee, scores from CATs will be more precise and should provide better data for social science research and applications.

[1]  David J. Weiss The Stratified Adaptive Computerized Ability Test. , 1973 .

[2]  De Ayala,et al.  The Theory and Practice of Item Response Theory , 2008 .

[3]  H. V. Gelder The Netherlands , 2004, Constitutions of Europe (2 vols.).

[4]  S. Embretson,et al.  Item response theory for psychologists , 2000 .

[5]  P. H. Dubois A history of psychological testing , 1970 .

[6]  David J. Weiss,et al.  Using computerized adaptive testing to reduce the burden of mental health assessment. , 2008, Psychiatric services.

[7]  Anastasios A. Economides,et al.  A Review of Item Exposure Control Strategies for Computerized Adaptive Testing Developed from 1983 to 2005 , 2007 .

[8]  Niels G. Waller,et al.  Moderated Multiple Regression, Spurious Interaction Effects, and IRT , 2005 .

[9]  Hua-Hua Chang,et al.  Optimal Stratification of Item Pools in α-Stratified Computerized Adaptive Testing , 2003 .

[10]  Susan E. Embretson,et al.  Item Response Theory Models and Spurious Interaction Effects in Factorial ANOVA Designs , 1996 .

[11]  L. Cronbach Coefficient alpha and the internal structure of tests , 1951 .

[12]  M. J. Allen Introduction to Measurement Theory , 1979 .

[13]  Anthony R. Zara,et al.  Procedures for Selecting Items for Computerized Adaptive Tests. , 1989 .

[14]  Ronald K. Hambleton,et al.  Multistage Testing: Issues, Designs, and Research , 2009 .

[15]  H. Gulliksen Theory of mental tests , 1952 .

[16]  Z. Ying,et al.  a-Stratified Multistage Computerized Adaptive Testing , 1999 .

[17]  L. Cronbach,et al.  How we should measure "change": Or should we? , 1970 .

[18]  David J. Weiss,et al.  Item Selection and Hypothesis Testing for the Adaptive Measurement of Change , 2010 .

[19]  A. Binet,et al.  Méthodes nouvelles pour le diagnostic du niveau intellectuel des anormaux , 1904 .

[20]  David J. Weiss,et al.  A Framework for the Development of Computerized Adaptive Tests. , 2011 .

[21]  Gyenam Kim-Kang,et al.  Adaptive Measurement of Individual Change , 2008 .

[22]  Anthony R. Zara,et al.  A Comparison of Procedures for Content-Sensitive Item Selection in Computerized Adaptive Tests. , 1991 .