Bayesian Modal Estimation of the Four-Parameter Item Response Model in Real, Realistic, and Idealized Data Sets

ABSTRACT In this study, we explored item and person parameter recovery of the four-parameter model (4PM) in over 24,000 real, realistic, and idealized data sets. In the first analyses, we fit the 4PM and three alternative models to data from three Minnesota Multiphasic Personality Inventory-Adolescent form factor scales using Bayesian modal estimation (BME). Our results indicated that the 4PM fits these scales better than simpler item Response Theory (IRT) models. Next, using the parameter estimates from these real data analyses, we estimated 4PM item parameters in 6,000 realistic data sets to establish minimum sample size requirements for accurate item and person parameter recovery. Using a factorial design that crossed discrete levels of item parameters, sample size, and test length, we also fit the 4PM to an additional 18,000 idealized data sets to extend our parameter recovery findings. Our combined results demonstrated that 4PM item parameters and parameter functions (e.g., item response functions) can be accurately estimated using BME in moderate to large samples (N ⩾ 5, 000) and person parameters can be accurately estimated in smaller samples (N ⩾ 1, 000). In the supplemental files, we report annotated code that shows how to estimate 4PM item and person parameters in (Chalmers, 2012).

[1]  John B. Carroll,et al.  The effect of difficulty and chance success on correlations between items or between tests , 1945 .

[2]  F. Lord Applications of Item Response Theory To Practical Testing Problems , 1980 .

[3]  Ke-Hai Yuan,et al.  Information Matrices and Standard Errors for MLEs of Item Parameters in IRT , 2014, Psychometrika.

[4]  H. Swaminathan,et al.  Bayesian estimation in the two-parameter logistic model , 1985 .

[5]  Tiffany A. Whittaker,et al.  The Impact of Varied Discrimination Parameters on Mixed-Format Item Response Theory Model Selection , 2013 .

[6]  Melvin R. Novick,et al.  Some latent train models and their use in inferring an examinee's ability , 1966 .

[7]  S. Hathaway,et al.  MMPI-2 : Minnesota Multiphasic Personality Inventory-2 : manual for administration and scoring , 1989 .

[8]  S. Culpepper Revisiting the 4-Parameter Item Response Model: Bayesian Estimation and Application , 2016, Psychometrika.

[9]  E. Muraki,et al.  Full-Information Item Factor Analysis , 1988 .

[10]  Raymond B. Cattell,et al.  The scientific nature of factors: A demonstration by cups of coffee , 2007 .

[11]  Stable Response Functions with Unstable Item Parameter Estimates , 2002 .

[12]  J. Carroll The nature of the data, or how to choose a correlation coefficient , 1961 .

[13]  Eric Loken,et al.  Estimation of a four-parameter item response theory model. , 2010, The British journal of mathematical and statistical psychology.

[14]  Magnus Lie Hetland Simulating Ability: Representing Skills in Games , 2013, SGDA.

[15]  Richard J. Patz,et al.  A Straightforward Approach to Markov Chain Monte Carlo Methods for Item Response Models , 1999 .

[16]  Fritz Drasgow,et al.  Recovery of Two- and Three-Parameter Logistic Item Characteristic Curves: A Monte Carlo Study , 1982 .

[17]  N G Waller,et al.  A Method for Generating Simulated Plasmodes and Artificial Test Clusters with User-Defined Shape, Size, and Orientation. , 1999, Multivariate behavioral research.

[18]  Allan S. Cohen,et al.  IRT Model Selection Methods for Dichotomous Items , 2007 .

[19]  Ying Cheng,et al.  The Effect of Upper and Lower Asymptotes of IRT Models on Computerized Adaptive Testing , 2015, Applied psychological measurement.

[20]  Martha L. Stocking,et al.  Specifying optimum examinees for item parameter estimation in item response theory , 1990 .

[21]  Cosma Rohilla Shalizi,et al.  Philosophy and the practice of Bayesian statistics. , 2010, The British journal of mathematical and statistical psychology.

[22]  R. Cattell,et al.  The uniqueness and significance of simple structure demonstrated by contrasting organic “natural structure” and “random structure” data , 1963 .

[23]  Li Cai,et al.  A Cautionary Note on Using G2(dif) to Assess Relative Model Fit in Categorical Data Analysis , 2006, Multivariate behavioral research.

[24]  N. Waller,et al.  Abstract: Estimation of the 4-Parameter Model with Marginal Maximum Likelihood , 2014, Multivariate behavioral research.

[25]  Martha L. Stocking,et al.  Developing a Common Metric in Item Response Theory , 1982 .

[26]  M. Browne,et al.  A Quasi-Parametric Method for Fitting Flexible Item Response Functions , 2015 .

[27]  D. Magis A Note on the Item Information Function of the Four-Parameter Logistic Model , 2013 .

[28]  G. Skaggs,et al.  A Comparison of Pseudo-Bayesian and Joint Maximum Likelihood Procedures for Estimating Item Parameters in the Three-Parameter IRT Model , 1989 .

[29]  Implementation of Marginal Bayesian Estimation with Four-Parameter Beta Prior Distributions , 1997 .

[30]  D. D. Gruijter A comment on ‘some standard errors in item response theory’ , 1984 .

[31]  Frederic M. Lord,et al.  An Analysis of the Verbal Scholastic Aptitude Test Using Birnbaum's Three-Parameter Logistic Model , 1968 .

[32]  J. Ramsay Kernel smoothing approaches to nonparametric item characteristic curve estimation , 1991 .

[33]  Frederic M. Lord,et al.  An Upper Asymptote for the Three-Parameter Logistic Item-Response Model. , 1981 .

[34]  John K Kruschke,et al.  Bayesian data analysis. , 2010, Wiley interdisciplinary reviews. Cognitive science.

[35]  R. D. Bock,et al.  Adaptive EAP Estimation of Ability in a Microcomputer Environment , 1982 .

[36]  Furong Gao,et al.  Bayesian or Non-Bayesian: A Comparison Study of Item Parameter Estimation in the Three-Parameter Logistic Model , 2005 .

[37]  D. Lawley,et al.  XXIII.—On Problems connected with Item Selection and Test Construction , 1943, Proceedings of the Royal Society of Edinburgh. Section A. Mathematical and Physical Sciences.

[38]  F. Lord A theory of test scores. , 1952 .

[39]  R. Hambleton,et al.  Item Response Theory , 1984, The History of Educational Measurement.

[40]  Kelly L. Rulison,et al.  I've Fallen and I Can't Get Up: Can High-Ability Students Recover From Early Mistakes in CAT? , 2009, Applied psychological measurement.

[41]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[42]  R. Philip Chalmers,et al.  mirt: A Multidimensional Item Response Theory Package for the R Environment , 2012 .

[43]  Niels G. Waller,et al.  Measuring psychopathology with non-standard IRT models: Fitting the four-parameter model to the MMPI , 2010 .

[44]  Janice A. Gifford,et al.  Bayesian estimation in the three-parameter logistic model , 1986 .

[45]  George Engelhard,et al.  Full-Information Item Factor Analysis: Applications of EAP Scores , 1985 .

[46]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[47]  R. D. Bock,et al.  Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm , 1981 .

[48]  F. Drasgow An Evaluation of Marginal Maximum Likelihood Estimation for the Two-Parameter Logistic Model , 1989 .

[49]  Wim J. van der Linden,et al.  IRT-Based Internal Measures of Differential Functioning of Items and Tests , 1995 .

[50]  Anna Gerber,et al.  Item Response Theory Principles And Applications , 2016 .

[51]  David B. Allison,et al.  Evaluating Statistical Methods Using Plasmode Data Sets in the Age of Massive Public Databases: An Illustration Using False Discovery Rates , 2008, PLoS genetics.

[52]  Yung-Chin Yen,et al.  An Empirical Evaluation of the Slip Correction in the Four Parameter Logistic Models With Computerized Adaptive Testing , 2012 .

[53]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[54]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[55]  R. Darrell Bock,et al.  Fitting a response model forn dichotomously scored items , 1970 .

[56]  Robert J. Mislevy,et al.  Bayes modal estimation in item response models , 1986 .

[57]  H. Wainer,et al.  Some standard errors in item response theory , 1982 .

[58]  Ilker Yalcin,et al.  Nonlinear factor analysis , 1995 .

[59]  S. Reise,et al.  How many IRT parameters does it take to model psychopathology items? , 2003, Psychological methods.

[60]  S. Reise,et al.  Item response theory for dichotomous assessment data , 2001 .

[61]  Detection of determinant genes and diagnostic via Item Response Theory , 2004 .