Several methods for estimating item response theory scores for multiple subtests were compared. These methods included two multidimensional item response theory models: a bifactor model where each subtest was a composite score based on the primary trait measured by the set of tests and a secondary trait measured by the individual subtest, and a model where the traits measured by the subtests were separate but correlated. Composite scores based on unidimensional item response theory, with each subtest borrowing information from the other subtest, as well as independent unidimensional scores for each subtest were also considered. Correlations among scores from all methods were high, though somewhat lower for the independent unidimensional scores. Correlations between course grades and test scores, a measure of validity, were similar for all methods, though again slightly lower for the unidimensional scores. To assess bias and RMSE, data were simulated using the parameters estimated for the correlated factors model. The independent unidimensional scores showed the greatest bias and RMSE; the relative performance of the other three methods varied with the subscale. Scoring Subscales 3 Scoring Subscales using Multidimensional Item Response Theory Model Tests are often designed such that each item measures the primary trait and one additional secondary trait. The secondary traits may reflect different content categories in the test blueprint, or different tests within a battery of tests. In this situation, test users may want subscale scores, each of which reflects both the primary trait and the relevant secondary trait. Two multidimensional item response theory (MIRT) models are potentially useful in this context: a model with n correlated traits, where n is the number of subscales, or a bifactor model with one primary trait and n orthogonal secondary traits. An additional model, which applies unidimensional IRT in the initial scoring of each subscale but then borrows information from correlated subscales in forming the final subscale scores, could also be applied. In the bifactor model, all items are specified to load on the primary factor. Additionally, each item may load on one additional factor. The factors are orthogonal (Gibbons & Hedeker, 1992; McLeod, Swygert, & Thissen, 2001). In other words, a secondary factor is the common factor a group of items shares beyond their shared association with the primary factor. Hierarchical is a more general term for this class of models; bifactor emphasizes that each item loads on no more than two traits, including the primary trait. With the bifactor model, scores can be estimated for the primary trait and each secondary trait. On a battery of tests, though, it would seem desirable for each subtest score to be a measure of the overall construct covered in the subtest, not just the part of the construct not covered by the primary factor. In other words, the score should be a combination of the primary trait and the secondary trait, not just the secondary trait. To quantify the relative weights of the factors contributing to an item response, Reckase (1985; 1997; Reckase & McKinley,1991) defined the direction of greatest slope for item i as Scoring Subscales 4
[1]
Terry A. Ackerman.
The Use of Unidimensional Parameter Estimates of Multidimensional Items in Adaptive Testing
,
1991
.
[2]
Donald Hedeker,et al.
Full-information item bi-factor analysis
,
1992
.
[3]
Mark D. Reckase,et al.
A Linear Logistic Multidimensional Model for Dichotomous Item Response Data
,
1997
.
[4]
Timothy R. Miller.
Empirical Estimation of Standard Errors of Compensatory MIRT Model Parameters Obtained from the NOHARM Estimation Program. ACT Research Report Series.
,
1991
.
[5]
R. P. McDonald,et al.
Test Theory: A Unified Treatment
,
1999
.
[6]
Mark D. Reckase,et al.
The Difficulty of Test Items That Measure More Than One Ability
,
1985
.
[7]
Mark D. Reckase,et al.
The Discriminating Power of Items That Measure More Than One Dimension
,
1991
.
[8]
D. Thissen,et al.
Factor analysis for items scored in two categories
,
2000
.
[9]
M. Gessaroli,et al.
Assessing the Dimensionality of Item Response Matrices with Small Sample Sizes and Short Test Lengths
,
1998
.
[10]
D. Knol,et al.
Empirical Comparison Between Factor Analysis and Multidimensional Item Response Models.
,
1991,
Multivariate behavioral research.
[11]
George Engelhard,et al.
Full-Information Item Factor Analysis: Applications of EAP Scores
,
1985
.
[12]
E. Muraki,et al.
Full-Information Item Factor Analysis
,
1988
.
[13]
Roderick P. McDonald,et al.
Normal-Ogive Multidimensional Model
,
1997
.