A Hierarchical IRT Model for Criterion-Referenced Measurement

A hierarchical IRT model is proposed for mastery classification in criterion-referenced measurement. In this model, items measuring the same criterion are grouped, and a difficulty and discrimination parameter of the criterion is estimated on the same scale as the person and item parameters. The level of proficiency of a student with respect to the criterion is determined by the probability of success on the criterion. Cutoff points on the probability scale can be used to classify respondents into masters and nonmasters. The hierarchical IRT model is estimated using the Gibbs sampler and tested using posterior predictive checks. The model is illustrated with a test measuring the attainment targets of reading comprehension (in Dutch) at the end of primary education.

[1]  M. R. Novick,et al.  Statistical Theories of Mental Test Scores. , 1971 .

[2]  B. Plake,et al.  Effects of Item Context on Intrajudge Consistency of Expert Judgments via the Nedelsky Standard Setting Method , 1989 .

[3]  R. Linn Educational measurement, 3rd ed. , 1989 .

[4]  M. Kane Validating the Performance Standards Associated With Passing Scores , 1994 .

[5]  D. Rubin Bayesianly Justifiable and Relevant Frequency Calculations for the Applied Statistician , 1984 .

[6]  R. Linn Issues of Validity for Criterion-Referenced Measures , 1980 .

[7]  Xiao-Li Meng,et al.  Posterior Predictive $p$-Values , 1994 .

[8]  C. Mitchell Dayton,et al.  The Nature and Use of State Mastery Models , 1980 .

[9]  Ronald A. Berk,et al.  Criterion-Referenced Measurement: The State of the Art , 1980 .

[10]  Ronald A. Berk,et al.  Standard Setting: The Next Generation (Where Few Psychometricians Have Gone Before!) , 1996 .

[11]  Robert J. Mislevy,et al.  DEALING WITH UNCERTAINTY ABOUT ITEM PARAMETERS: EXPECTED RESPONSE FUNCTIONS , 1994 .

[12]  Gregory J. Cizek,et al.  Reconsidering Standards and Criteria , 1993 .

[13]  G. H. Fischer,et al.  Logistic latent trait models with linear constraints , 1983 .

[14]  Gene V. Glass,et al.  Standards and Criteria* , 1978, Journal of MultiDisciplinary Evaluation.

[15]  Walter R. Gilks,et al.  Model checking and model improvement , 1995 .

[16]  Samuel A. Livingston,et al.  Passing Scores: A Manual for Setting Standards of Performance on Educational and Occupational Tests. , 1982 .

[17]  A. Zellner,et al.  Gibbs Sampler Convergence Criteria , 1995 .

[18]  C. Reynolds,et al.  The Handbook of School Psychology , 1982 .

[19]  Sylvia Richardson,et al.  Markov Chain Monte Carlo in Practice , 1997 .

[20]  Judy A. Shea,et al.  The Credibility and Comparability of Standards , 1997 .

[21]  Ronald K. Hambleton,et al.  TOWARD AN INTEGRATION OF THEORY AND METHOD FOR CRITERION-REFERENCED TESTS1,2 , 1973 .

[22]  Cees A. W. Glas,et al.  The derivation of some tests for the rasch model from the multinomial distribution , 1988 .

[23]  W. D. Linden A latent trait method for determining intrajudge inconsistency in the Angoff and Nedelsky techniques of standard setting , 1982 .

[24]  L. Nedelsky Absolute Grading Standards for Objective Tests , 1954 .

[25]  J. Albert Bayesian Estimation of Normal Ogive Item Response Curves Using Gibbs Sampling , 1992 .

[26]  M. Tanner Tools for statistical inference: methods for the exploration of posterior distributions and likeliho , 1994 .

[27]  L. Shepard Standard Setting Issues and Methods , 1980 .

[28]  Eric T. Bradlow,et al.  A Bayesian random effects model for testlets , 1999 .

[29]  Maria T. Potenza,et al.  Content Specificity of Expert Judgments in a Standard‐Setting Study , 1994 .

[30]  Charles J. Geyer,et al.  Practical Markov Chain Monte Carlo , 1992 .

[31]  Ronald A. Berk,et al.  A Framework for Methodological Advances in Criterion-Referenced Testing , 1980 .

[32]  Clement A. Stone,et al.  Latent class models for knowledge domains , 1985 .

[33]  Ronald A. Berk,et al.  A Consumer’s Guide to Setting Performance Standards on Criterion-Referenced Tests , 1986 .

[34]  Huynh Huynh,et al.  Statistical consideration of mastery scores , 1976 .

[35]  John A. Meskauskas,et al.  Evaluation Models for Criterion-Referenced Testing: Views Regarding Mastery and Standard-Setting , 1976 .

[36]  D. Rubin,et al.  Inference from Iterative Simulation Using Multiple Sequences , 1992 .

[37]  Michael C. Hynes,et al.  Does a Standard Reflect Minimal Competency of Examinees or Judge Competency , 1996 .

[38]  David J. Spiegelhalter,et al.  Hepatitis B: a case study in MCMC methods , 1996 .

[39]  Xiao-Li Meng,et al.  POSTERIOR PREDICTIVE ASSESSMENT OF MODEL FITNESS VIA REALIZED DISCREPANCIES , 1996 .

[40]  I. W. Molenaar,et al.  A multidimensional item response model: Constrained latent class analysis using the gibbs sampler and posterior predictive checks , 1997 .