Assessing CAT Test Security Severity

In addition to its precision superiority over nonadaptive tests, another known advantage of computerized adaptive tests (CATs) is that they can be offered on a continuous basis. This is advantageous to examinees in terms of flexibility of test scheduling, as well as advantageous to schools and other testing centers in terms of both space and number of computers required for testing. Unfortunately, it must be acknowledged that continuous testing capability is also the major weakness of CATs: Continuous testing implies continuous item exposure. Examinees who took tests earlier may share information with examinees who will take the tests later; there is a risk that many items may become known to examinees before the actual test dates. Chang and Zhang (2002, 2003) initiated the theoretical developments in assessing test security severities for high-stakes CATs. Their theoretical derivation is based on a randomization item selection procedure that provides equalized item exposure and, hence, the best test security control in CATs. Let Zα be the number of items that can be compromised by α ‘‘professional’’ test takers, assuming each person can memorize β items. Based on the theoretical derivation, E[Zα], the expected value of Zα, can be computed analytically under various test settings. Serving as an upper bound, E[Zα] indicates at most how many ‘‘professional’’ test takers are needed to compromise a sizable percentage of an item pool. The software AddChart Application (Yi, Zhang, & Chang, 2005) has been developed to examine the relationship among item pool size, the number of items each ‘‘professional’’ test taker can memorize, and the percentage of the item pool that can be compromised. Note that ‘‘professional’’ test takers are examinees who either are employed by test preparation organizations or have taken the same test several times to boost their test scores. As such, they all have the motivation of memorizing as many items as possible. AddChart Application can demonstrate, for example, for a given item pool, the number of ‘‘professional’’ test takers that are needed to compromise various percentages of the item pool given that each person can memorize β items. Therefore, the program can be used to assist practitioners and researchers in designing a more secure CAT based on the information from examining the relationship between the number of ‘‘professional’’ test takers needed and the percentage of the compromised item pool. The objective of this software is to provide a theoretical upper bound under various test settings.