Does Standard Deviation Matter? Using “Standard Deviation” to Quantify Security of Multistage Testing

With the advent of web-based technology, online testing is becoming a mainstream mode in large-scale educational assessments. Most online tests are administered continuously in a testing window, which may post test security problems because examinees who take the test earlier may share information with those who take the test later. Researchers have proposed various statistical indices to assess the test security, and one most often used index is the average test-overlap rate, which was further generalized to the item pooling index (Chang & Zhang, 2002, 2003). These indices, however, are all defined as the means (that is, the expected proportion of common items among examinees) and they were originally proposed for computerized adaptive testing (CAT). Recently, multistage testing (MST) has become a popular alternative to CAT. The unique features of MST make it important to report not only the mean, but also the standard deviation (SD) of test overlap rate, as we advocate in this paper. The standard deviation of test overlap rate adds important information to the test security profile, because for the same mean, a large SD reflects that certain groups of examinees share more common items than other groups. In this study, we analytically derived the lower bounds of the SD under MST, with the results under CAT as a benchmark. It is shown that when the mean overlap rate is the same between MST and CAT, the SD of test overlap tends to be larger in MST. A simulation study was conducted to provide empirical evidence. We also compared the security of MST under the single-pool versus the multiple-pool designs; both analytical and simulation studies show that the non-overlapping multiple-pool design will slightly increase the security risk.

[1]  Hua-Hua Chang Understanding Computerized Adaptive Testing: From Robbins-Monro to Lord and Beyond , 2004 .

[2]  Judith A. Spray,et al.  The Relationship Between Item Exposure and Test Overlap in Computerized Adaptive Testing , 2003 .

[3]  Richard M. Luecht,et al.  Some Practical Examples of Computer‐Adaptive Sequential Testing , 1998 .

[4]  Cornelis A.W. Glas,et al.  Computerized adaptive testing : theory and practice , 2000 .

[5]  Hua-Hua Chang,et al.  The effectiveness of using multiple item pools to increase test security in computerized adaptive testing , 2011 .

[6]  Matthew Finkelman,et al.  A Conditional Exposure Control Method for Multidimensional Adaptive Testing , 2009 .

[7]  Donovan R. Hare,et al.  Automated Simultaneous Assembly of Multistage Testlets for a High-Stakes Licensing Examination , 2007 .

[8]  Barbara S. Plake,et al.  Monte Carlo Simulation Comparison of Two-Stage Testing and Computerized Adaptive Testing. , 1993 .

[9]  Manfred Steffen,et al.  The GRE Computer Adaptive Test: Operational Issues , 2000 .

[11]  Hua-Hua Chang,et al.  Comparing single-pool and multiple-pool designs regarding test security in computerized testing , 2012, Behavior research methods.

[12]  Walter D. Way Protecting the Integrity of Computerized Testing Item Pools , 1998 .

[13]  Amy Hendrickson An NCME Instructional Module on Multistage Testing , 2007 .

[14]  Hua-Hua Chang,et al.  Severity of Organized Item Theft in Computerized Adaptive Testing: A Simulation Study , 2008 .

[15]  Z. Ying,et al.  a-Stratified Multistage Computerized Adaptive Testing , 1999 .

[16]  Bernard P. Veldkamp,et al.  Constructing rotating item pools for constrained adaptive testing , 2002 .

[17]  David Kaplan,et al.  The Sage handbook of quantitative methodology for the social sciences , 2004 .

[18]  Hua-Hua Chang,et al.  Hypergeometric family and item overlap rates in computerized adaptive testing , 2002 .

[19]  Hua-Hua Chang,et al.  The maximum priority index method for severely constrained item selection in computerized adaptive testing. , 2009, The British journal of mathematical and statistical psychology.

[20]  Martha L. Stocking,et al.  Controlling Item Exposure Conditional on Ability in Computerized Adaptive Testing , 1995 .