As the goals of educational assessment evolve from the strictly evaluative to the di agnostically useful, so also evolve the statistical methods used to build, validate, and interpret educational tests. The methods discussed in this special issue all approach diagnosis in an item response theory (IRT) related way, with models that are parame terized at the item level and that extract information from individual item responses. Clearly, their most distinguishing feature is their more complex, multidimensional representation of examinee proficiency. This representation can be built directly into an item response model (as seen in most clearly in Almond, DiBello, Moulder, & Zapata-Rivera, 2007; Henson, Templin, & Douglas, 2007; Roussos, Templin, & Hen son, 2007; Stout, 2007) or else it can provide a framework for interpreting (residual) patterns in item responses (as is seen in Gierl, 2007). The complexity of the proficiency space introduces corresponding complexities into the statistical modeling and score reporting aspects of diagnosis. A high level of expert judgment is needed in formulating appropriate models. One of the pri mary challenges in implementing IRT-based cognitively diagnostic model (ICDMs) requires determining which aspects of the modeling process should be constrained through expert judgment and which can and should be informed by observed item re sponse data. The vast array of psychometric models now available for diagnosis and the different ways they handle these complexities (e.g., how many levels for each skill, how do skills interact, how does skill mastery translate to item performance, etc.) make model selection a central issue. At the same time, it can be challenging to compare models according to goodness of fit due to the many other aspects within each model that must be informed by experts (e.g., entries of the item-by-skill Q matrix, structure of the proficiency space, etc). Data-driven model re-specification is often messy. Collectively, the papers presented in this Special Issue provide a comprehensive overview of the state of the art in IRT-based diagnosis. While all emphasize a com mon end-goal of examinee diagnosis, the process by which this is achieved and the balance of data-driven and expert-driven decision making used along the way also introduce important differences.
[1]
Jonathan Templin,et al.
Using Efficient Model Based Sum‐Scores for Conducting Skills Diagnoses
,
2007
.
[2]
William Stout,et al.
Skills Diagnosis Using IRT‐Based Continuous Latent Trait Models
,
2007
.
[3]
B. Junker,et al.
Cognitive Assessment Models with Few Assumptions, and Connections with Nonparametric Item Response Theory
,
2001
.
[4]
G. H. Fischer,et al.
The linear logistic test model as an instrument in educational research
,
1973
.
[5]
Paul De Boeck,et al.
The Random Weights Linear Logistic Test Model
,
2002
.
[6]
J. Templin,et al.
Skills Diagnosis Using IRT-Based Latent Class Models
,
2007
.
[7]
Russell G. Almond,et al.
Modeling Diagnostic Assessments with Bayesian Networks
,
2007
.
[8]
Mark J. Gierl,et al.
Making Diagnostic Inferences about Cognitive Attributes Using the Rule-Space Model and Attribute Hierarchy Method.
,
2007
.
[9]
Jeffrey A Douglas,et al.
Higher-order latent trait models for cognitive diagnosis
,
2004
.