A Comparison of the Performance of Simulated Hierarchical and Linear Testlets

A series of computer simulations were run to measure the relationship between testlet validity and the factors of item pool size and testlet length for both adaptive and linearly constructed testlets. We confirmed the generality of earlier empirical findings (Wainer, Lewis, Kaplan, & Braswell, 1991) that making a testlet adaptive yields only modest increases in aggregate validity because of the peakedness of the typical proficiency distribution.