An Investigation of the Item Parameter Recovery Characteristics of a Gibbs Sampling Procedure

The item parameter recovery characteristics of a Gibb's sampling method (Albert, 1992) for IRT item parameter estimation were investigated using a simulation study. The item parameters were estimated, under a normal ogive item response function model, using Gibbs sampling and BILOG (Mislevy & Bock, 1989). The item parameter estimates were then equated to the metric of the underlying item parameters for tests with 10, 20, 30, and 50 items, and samples of 30, 60, 120, and 500 examinees. Summary statistics of the equating coefficients showed that Gibbs sampling and BILOG both produced trait scale metrics with units of measurement that were too small, but yielding a proper midpoint of the metric. When expressed in a common metric, the biases of the BILOG estimates of the item discriminations were uniformly smaller and less variable than those from Gibbs sampling. The biases of the item difficulty estimates yielded by the two estimation procedures were small and similar to each other. In addition, the item parameter recovery characteristics were comparable for the largest dataset of 50 items and 500 examinees. However, for short tests and sample sizes the item parameter recovery characteristics of BILOG were superior to those of the Gibbs sampling approach.

[1]  Fritz Drasgow,et al.  Recovery of Two- and Three-Parameter Logistic Item Characteristic Curves: A Monte Carlo Study , 1982 .

[2]  Adrian F. M. Smith,et al.  Bayesian computation via the gibbs sampler and related markov chain monte carlo methods (with discus , 1993 .

[3]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Frank B. Baker EQUATE 2.0: A Computer Program for the Characteristic Curve Method of IRT Equating , 1993 .

[5]  Melvin R. Novick,et al.  Some latent train models and their use in inferring an examinee's ability , 1966 .

[6]  Robert J. Mislevy,et al.  BILOG 3 : item analysis and test scoring with binary logistic models , 1990 .

[7]  William H. Press,et al.  Numerical recipes , 1990 .

[8]  R. Kirk Experimental Design: Procedures for the Behavioral Sciences , 1970 .

[9]  J. Albert Bayesian Estimation of Normal Ogive Item Response Curves Using Gibbs Sampling , 1992 .

[10]  M. R. Novick,et al.  Statistical Theories of Mental Test Scores. , 1971 .

[11]  F. Baker Some Observations on the Metric of PC-BILOG Results , 1990 .

[12]  W. M. Yen Using Simulation Results to Choose a Latent Trait Model , 1981 .

[13]  G. Casella,et al.  Explaining the Gibbs Sampler , 1992 .

[14]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[15]  Adrian F. M. Smith,et al.  Sampling-Based Approaches to Calculating Marginal Densities , 1990 .

[16]  Martha L. Stocking,et al.  Developing a Common Metric in Item Response Theory , 1983 .

[17]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[18]  Frank B. Baker,et al.  EQUATE: A Computer Program for the Test Characteristic Curve Method of IRT Equating , 1991 .

[19]  Bradley P. Carlin,et al.  Markov Chain Monte Carlo conver-gence diagnostics: a comparative review , 1996 .