A Cautionary Note on Using G2(dif) to Assess Relative Model Fit in Categorical Data Analysis

The likelihood ratio test statistic G2(dif) is widely used for comparing the fit of nested models in categorical data analysis. In large samples, this statistic is distributed as a chi-square with degrees of freedom equal to the difference in degrees of freedom between the tested models, but only if the least restrictive model is correctly specified. Yet, this statistic is often used in applications without assessing the adequacy of the least restrictive model. This may result in incorrect substantive conclusions as the above large sample reference distribution for G2(dif) is no longer appropriate. Rather, its large sample distribution will depend on the degree of model misspecification of the least restrictive model. To illustrate this, a simulation study is performed where this statistic is used to compare nested item response theory models under various degrees of misspecification of the least restrictive model. G2(dif) was found to be robust only under small model misspecification of the least restrictive model. Consequently, we argue that some indication of the absolute goodness of fit of the least restrictive model is needed before employing G2(dif) to assess relative model fit.

[1]  Shelby J. Haberman,et al.  Log-Linear Models and Frequency Tables with Small Expected Cell Counts , 1977 .

[2]  M. R. Novick,et al.  Statistical Theories of Mental Test Scores. , 1971 .

[3]  K. Koehler,et al.  An Empirical Investigation of Goodness-of-Fit Statistics for Sparse Multinomials , 1980 .

[4]  R. Hambleton,et al.  Fundamentals of Item Response Theory , 1991 .

[5]  Ke-Hai Yuan,et al.  On Chi-Square Difference and z Tests in Mean and Covariance Structure Analysis when the Base Model is Misspecified , 2004 .

[6]  Mark Reiser,et al.  Analysis of residuals for the multionmial item response model , 1996 .

[7]  W. G. Cochran The $\chi^2$ Test of Goodness of Fit , 1952 .

[8]  Ab Mooijaart,et al.  Type I errors and power of the parametric bootstrap goodness-of-fit test: full and limited information. , 2003, The British journal of mathematical and statistical psychology.

[9]  David Thissen,et al.  A response model for multiple choice items , 1984 .

[10]  P L Fidler,et al.  Goodness-of-Fit Testing for Latent Class Models. , 1993, Multivariate behavioral research.

[11]  R. J. Mokken,et al.  Handbook of modern item response theory , 1997 .

[12]  K. Larntz Small-Sample Comparisons of Exact Levels for Chi-Squared Goodness-of-Fit Statistics , 1978 .

[13]  Albert Maydeu-Olivares,et al.  Using Graphical Methods in Assessing Measurement Invariance in Inventory Data , 1999 .

[14]  Fritz Drasgow,et al.  Fitting Polytomous Item Response Theory Models to Multiple-Choice Tests , 1995 .

[15]  David J. Bartholomew,et al.  The Goodness of Fit of Latent Trait Models in Attitude Measurement , 1999 .

[16]  K. Pearson On the χ 2 Test of Goodness of Fit , 1922 .

[17]  R. D. Bock,et al.  Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm , 1981 .

[18]  P. Fayers Item Response Theory for Psychologists , 2004, Quality of Life Research.