Scoring Subscales Using Multidimensional Item Response Theory Models.

Several methods for estimating item response theory scores for multiple subtests were compared. These methods included two multidimensional item response theory models: a bi­factor model where each subtest was a composite score based on the primary trait measured by the set of tests and a secondary trait measured by the individual subtest, and a model where the traits measured by the subtests were separate but correlated. Composite scores based on unidimensional item response theory, with each subtest borrowing information from the other subtest, as well as independent unidimensional scores for each subtest were also considered. Correlations among scores from all methods were high, though somewhat lower for the independent unidimensional scores. Correlations between course grades and test scores, a measure of validity, were similar for all methods, though again slightly lower for the unidimensional scores. To assess bias and RMSE, data were simulated using the parameters estimated for the correlated factors model. The independent unidimensional scores showed the greatest bias and RMSE; the relative performance of the other three methods varied with the subscale. Scoring Subscales 3 Scoring Subscales using Multidimensional Item Response Theory Model Tests are often designed such that each item measures the primary trait and one additional secondary trait. The secondary traits may reflect different content categories in the test blueprint, or different tests within a battery of tests. In this situation, test users may want subscale scores, each of which reflects both the primary trait and the relevant secondary trait. Two multidimensional item response theory (MIRT) models are potentially useful in this context: a model with n correlated traits, where n is the number of subscales, or a bi­factor model with one primary trait and n orthogonal secondary traits. An additional model, which applies unidimensional IRT in the initial scoring of each subscale but then borrows information from correlated subscales in forming the final subscale scores, could also be applied. In the bi­factor model, all items are specified to load on the primary factor. Additionally, each item may load on one additional factor. The factors are orthogonal (Gibbons & Hedeker, 1992; McLeod, Swygert, & Thissen, 2001). In other words, a secondary factor is the common factor a group of items shares beyond their shared association with the primary factor. Hierarchical is a more general term for this class of models; bi­factor emphasizes that each item loads on no more than two traits, including the primary trait. With the bi­factor model, scores can be estimated for the primary trait and each secondary trait. On a battery of tests, though, it would seem desirable for each subtest score to be a measure of the overall construct covered in the subtest, not just the part of the construct not covered by the primary factor. In other words, the score should be a combination of the primary trait and the secondary trait, not just the secondary trait. To quantify the relative weights of the factors contributing to an item response, Reckase (1985; 1997; Reckase & McKinley,1991) defined the direction of greatest slope for item i as Scoring Subscales 4