On the Complexity of Item Response Theory Models

ABSTRACT Complexity in item response theory (IRT) has traditionally been quantified by simply counting the number of freely estimated parameters in the model. However, complexity is also contingent upon the functional form of the model. We examined four popular IRT models—exploratory factor analytic, bifactor, DINA, and DINO—with different functional forms but the same number of free parameters. In comparison, a simpler (unidimensional 3PL) model was specified such that it had 1 more parameter than the previous models. All models were then evaluated according to the minimum description length principle. Specifically, each model was fit to 1,000 data sets that were randomly and uniformly sampled from the complete data space and then assessed using global and item-level fit and diagnostic measures. The findings revealed that the factor analytic and bifactor models possess a strong tendency to fit any possible data. The unidimensional 3PL model displayed minimal fitting propensity, despite the fact that it included an additional free parameter. The DINA and DINO models did not demonstrate a proclivity to fit any possible data, but they did fit well to distinct data patterns. Applied researchers and psychometricians should therefore consider functional form—and not goodness-of-fit alone—when selecting an IRT model.

[1]  K. Holzinger,et al.  The Bi-factor method , 1937 .

[2]  Ray J. Solomonoff,et al.  A Formal Theory of Inductive Inference. Part I , 1964, Inf. Control..

[3]  Ray J. Solomonoff,et al.  A Formal Theory of Inductive Inference. Part II , 1964, Inf. Control..

[4]  I. Guttman The Use of the Concept of a Future Observation in Goodness‐Of‐Fit Problems , 1967 .

[5]  M. R. Novick,et al.  Statistical Theories of Mental Test Scores. , 1971 .

[6]  Chris S. Wallace,et al.  A Program for Numerical Classification , 1970, Comput. J..

[7]  H. Akaike A new look at the statistical model identification , 1974 .

[8]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[9]  K. Wexler A review of John R. Anderson's language, memory, and thought , 1978, Cognition.

[10]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[11]  D. G. Weeks,et al.  Interrelations Among Models For The Analysis Of Moment Structures. , 1979, Multivariate behavioral research.

[12]  George E. P. Box,et al.  Sampling and Bayes' inference in scientific modelling and robustness , 1980 .

[13]  R. D. Bock,et al.  Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm , 1981 .

[14]  J. Rissanen A UNIVERSAL PRIOR FOR INTEGERS AND ESTIMATION BY MINIMUM DESCRIPTION LENGTH , 1983 .

[15]  D. Rubin Bayesianly Justifiable and Relevant Frequency Calculations for the Applied Statistician , 1984 .

[16]  Charles E. Collyer,et al.  Comparing strong and weak models by fitting them to computer-generated data , 1985 .

[17]  J D Botha,et al.  Uniform Indices-of-Fit for Factor Analysis Models. , 1988, Multivariate behavioral research.

[18]  E. Muraki,et al.  Full-Information Item Factor Analysis , 1988 .

[19]  Edward H. Haertel Using restricted latent class models to map the skill structure of achievement items , 1989 .

[20]  Karl G. Jöreskog,et al.  Lisrel 8: User's Reference Guide , 1997 .

[21]  P. Holland On the sampling theory roundations of item response theory models , 1990 .

[22]  J. Teugels Some representations of the multivariate Bernoulli and binomial distributions , 1990 .

[23]  John E. Moody,et al.  The Effective Number of Parameters: An Analysis of Generalization and Regularization in Nonlinear Learning Systems , 1991, NIPS.

[24]  J. Cutting,et al.  Selectivity, scope, and simplicity of models: a lesson from fitting judgments of perceived depth. , 1992, Journal of experimental psychology. General.

[25]  A. Shiryayev On Tables of Random Numbers , 1993 .

[26]  Xiao-Li Meng,et al.  POSTERIOR PREDICTIVE ASSESSMENT OF MODEL FITNESS VIA REALIZED DISCREPANCIES , 1996 .

[27]  R. Gorsuch Exploratory factor analysis: its role in item analysis. , 1997, Journal of personality assessment.

[28]  D. Thissen,et al.  Local Dependence Indexes for Item Pairs Using Item Response Theory , 1997 .

[29]  Jorma Rissanen,et al.  Stochastic Complexity in Statistical Inquiry , 1989, World Scientific Series in Computer Science.

[30]  H Pashler,et al.  How persuasive is a good fit? A comment on theory testing. , 2000, Psychological review.

[31]  D. Thissen,et al.  Likelihood-Based Item-Fit Indices for Dichotomous Item Response Theory Models , 2000 .

[32]  Ming Li,et al.  Minimum description length induction, Bayesianism, and Kolmogorov complexity , 1999, IEEE Trans. Inf. Theory.

[33]  B. Junker,et al.  Cognitive Assessment Models with Few Assumptions, and Connections with Nonparametric Item Response Theory , 2001 .

[34]  Bin Yu,et al.  Model Selection and the Principle of Minimum Description Length , 2001 .

[35]  D. Bartholomew,et al.  A goodness of fit test for sparse 2p contingency tables. , 2002, British Journal of Mathematical & Statistical Psychology.

[36]  I. J. Myung,et al.  Toward a method of selecting among computational models of cognition. , 2002, Psychological review.

[37]  Maria Orlando,et al.  Further Investigation of the Performance of S - X2: An Item Fit Index for Use With Dichotomous Item Response Theory Models , 2003 .

[38]  Kristian E Markon,et al.  An Empirical Comparison of Information-Theoretic Selection Criteria for Multivariate Behavior Genetic Models , 2004, Behavior genetics.

[39]  Jeffrey A Douglas,et al.  Higher-order latent trait models for cognitive diagnosis , 2004 .

[40]  Peter Grünwald,et al.  A tutorial introduction to the minimum description length principle , 2004, ArXiv.

[41]  Mark A. Pitt,et al.  Model Evaluation, Testing and Selection , 2005 .

[42]  D. Thissen,et al.  Limited-information goodness-of-fit testing of item response theory models for sparse 2 tables. , 2006, The British journal of mathematical and statistical psychology.

[43]  J. Templin,et al.  Measurement of psychological disorders using cognitive diagnosis models. , 2006, Psychological methods.

[44]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[45]  M. Lee,et al.  Model selection for the rate problem: A comparison of significance testing, Bayesian, and minimum description length statistical inference , 2006 .

[46]  Kristopher J Preacher,et al.  Quantifying Parsimony in Structural Equation Modeling , 2006, Multivariate behavioral research.

[47]  M. Kenward,et al.  An Introduction to the Bootstrap , 2007 .

[48]  J. Templin,et al.  The Effects of Q-Matrix Misspecification on Parameter Estimates and Classification Accuracy in the DINA Model , 2008 .

[49]  J. Templin,et al.  Unique Characteristics of Diagnostic Classification Models: A Comprehensive Review of the Current State-of-the-Art , 2008 .

[50]  Kevin D. Wu,et al.  Anxiety as a context for understanding associations between hypochondriasis, obsessive-compulsive, and panic attack symptoms. , 2010, Behavior therapy.

[51]  Jonathan Templin,et al.  Diagnostic Measurement: Theory, Methods, and Applications , 2010 .

[52]  Jean-Paul Fox,et al.  Bayesian Item Response Modeling , 2010 .

[53]  Jorma Rissanen,et al.  Minimum Description Length Principle , 2010, Encyclopedia of Machine Learning.

[54]  Li Cai,et al.  HIGH-DIMENSIONAL EXPLORATORY ITEM FACTOR ANALYSIS BY A METROPOLIS–HASTINGS ROBBINS–MONRO ALGORITHM , 2010 .

[55]  J. Fox Bayesian Item Response Modeling: Theory and Applications , 2010 .

[56]  E. Walker,et al.  Diagnostic and Statistical Manual of Mental Disorders , 2013 .

[57]  M. Thomas Rewards of bridging the divide between measurement and clinical theory: demonstration of a bifactor model for the Brief Symptom Inventory. , 2012, Psychological assessment.

[58]  S. Reise The Rediscovery of Bifactor Measurement Models , 2012 .

[59]  D. Thissen,et al.  Numerical Differentiation Methods for Computing Error Covariance Matrices in Item Response Theory Modeling , 2013 .

[60]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[61]  M. Carter Diagnostic and Statistical Manual of Mental Disorders, 5th ed. , 2014 .

[62]  Z. Ying,et al.  Statistical Analysis of Q-Matrix Based Diagnostic Classification Models , 2015, Journal of the American Statistical Association.

[63]  S. Reise,et al.  Applying Bifactor Statistical Indices in the Evaluation of Psychological Measures , 2016, Journal of personality assessment.

[64]  Li Cai,et al.  Summed Score Likelihood–Based Indices for Testing Latent Variable Distribution Fit in Item Response Theory , 2017, Educational and psychological measurement.