A misspecification test for finite-mixture logistic models for clustered binary and ordered responses

An alternative to using normally distributed random effects in modeling clustered binary and ordered responses is based on using a finite-mixture. This approach gives rise to a flexible class of generalized linear mixed models for item responses, multilevel data, and longitudinal data. A test of misspecification for these finite-mixture models is proposed which is based on the comparison between the Marginal and the Conditional Maximum Likelihood estimates of the fixed effects as in the Hausman’s test. The asymptotic distribution of the test statistic is derived; it is of chi-squared type with a number of degrees of freedom equal to the number of covariates that vary within the cluster. It turns out that the test is simple to perform and may also be used to select the number of components of the finite-mixture, when this number is unknown. The approach is illustrated by a series of simulations and three empirical examples covering the main fields of application.

[1]  A. Agresti,et al.  Categorical Data Analysis , 1991, International Encyclopedia of Statistical Science.

[2]  Karen Bandeen-Roche,et al.  Building an identifiable latent class model with covariate effects on underlying and measured variables , 2004 .

[3]  Dipak C. Jain,et al.  A Random-Coefficients Logit Brand-Choice Model Applied to Panel Data , 1994 .

[4]  Murray Aitkin,et al.  Variance Component Models with Binary Response: Interviewer Variability , 1985 .

[5]  P. McCullagh Regression Models for Ordinal Data , 1980 .

[6]  G. Molenberghs,et al.  Linear Mixed Models for Longitudinal Data , 2001 .

[7]  H. Goldstein Multilevel Statistical Models , 2006 .

[8]  H. White Maximum Likelihood Estimation of Misspecified Models , 1982 .

[9]  S. Sclove Application of model-selection criteria to some problems in multivariate analysis , 1987 .

[10]  R. Suzman,et al.  An Overview of the Health and Retirement Study , 1995 .

[11]  P. Boeck,et al.  Explanatory item response models : a generalized linear and nonlinear approach , 2004 .

[12]  B. Lindsay The Geometry of Mixture Likelihoods: A General Theory , 1983 .

[13]  P. Heagerty,et al.  Misspecified maximum likelihood estimates and generalised linear mixed models , 2001 .

[14]  Rasmus Waagepetersen A Simulation-based Goodness-of-fit Test for Random Effects in Generalized Linear Mixed Models , 2006 .

[15]  Peter E. Rossi,et al.  Modeling the Distribution of Price Sensitivity and Implications for Optimal Retail Pricing , 1995 .

[16]  Florian Heiss Sequential numerical integration in nonlinear state space models for microeconometric panel data , 2008 .

[17]  Clifford M. Hurvich,et al.  A CORRECTED AKAIKE INFORMATION CRITERION FOR VECTOR AUTOREGRESSIVE MODEL SELECTION , 1993 .

[18]  L. Ryan,et al.  ASSESSING NORMALITY IN RANDOM EFFECTS MODELS , 1989 .

[19]  B. Coull,et al.  A diagnostic test for the mixing distribution in a generalised linear mixed model , 2006 .

[20]  J. Cleland,et al.  Bangladesh Fertility Survey 1989 (main report). , 1990 .

[21]  Geert Molenberghs,et al.  A family of tests to detect misspecifications in the random-effects structure of generalized linear mixed models , 2008, Comput. Stat. Data Anal..

[22]  P. Heagerty Marginally Specified Logistic‐Normal Models for Longitudinal Binary Data , 1999, Biometrics.

[23]  G Molenberghs,et al.  The impact of a misspecified random‐effects distribution on the estimation and the performance of inferential procedures in generalized linear mixed models , 2008, Statistics in medicine.

[24]  Francesco Bartolucci,et al.  Testing for time-invariant unobserved heterogeneity in generalized linear models for panel data , 2015 .

[25]  Sophia Rabe-Hesketh,et al.  Generalized latent variable models: multilevel, longitudinal, and structural equation models , 2004 .

[26]  Geert Molenberghs,et al.  The gradient function as an exploratory goodness-of-fit assessment of the random-effects distribution in mixed models. , 2013, Biostatistics.

[27]  G. Verbeke,et al.  A Linear Mixed-Effects Model with Heterogeneity in the Random-Effects Population , 1996 .

[28]  Francesca Ieva,et al.  Nonlinear nonparametric mixed-effects models for unsupervised classification , 2013, Comput. Stat..

[29]  Gerhard Tutz,et al.  Clustering in linear mixed models with approximate Dirichlet process mixtures using EM algorithm , 2013 .

[30]  Francesco Bartolucci,et al.  A class of multidimensional IRT models for testing unidimensionality and clustering items , 2007 .

[31]  Zhiying Pan,et al.  Goodness‐of‐Fit Methods for Generalized Linear Mixed Models , 2005, Biometrics.

[32]  P. McCullagh,et al.  Generalized Linear Models , 1984 .

[33]  P. Deb A discrete random effects probit model with application to the demand for preventive care. , 2001, Health economics.

[34]  Paul F. Lazarsfeld,et al.  Latent Structure Analysis. , 1969 .

[35]  M. Aitkin A General Maximum Likelihood Analysis of Variance Components in Generalized Linear Models , 1999, Biometrics.

[36]  Geert Molenberghs,et al.  Testing for misspecification in generalized linear mixed models. , 2010, Biostatistics.

[37]  Murray Aitkin,et al.  A general maximum likelihood analysis of overdispersion in generalized linear models , 1996, Stat. Comput..

[38]  A. McQuarrie,et al.  Regression and Time Series Model Selection , 1998 .

[39]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[40]  E. B. Andersen,et al.  Asymptotic Properties of Conditional Maximum‐Likelihood Estimators , 1970 .

[41]  Georg Rasch,et al.  Probabilistic Models for Some Intelligence and Attainment Tests , 1981, The SAGE Encyclopedia of Research Design.

[42]  José G. Dias,et al.  Model Selection for the Binary Latent Class Model: A Monte Carlo Simulation , 2006, Data Science and Classification.

[43]  N. Laird Nonparametric Maximum Likelihood Estimation of a Mixing Distribution , 1978 .

[44]  Identification and Estimation of Thresholds in the Fixed Effects Ordered Logit Model , 2011 .

[45]  L. A. Goodman Exploratory latent structure analysis using both identifiable and unidentifiable models , 1974 .

[46]  Francesco Bartolucci,et al.  A Class of Multidimensional Latent Class IRT Models for Ordinal Polytomous Item Responses , 2012, 1201.4667.

[47]  J. Hausman Specification tests in econometrics , 1978 .

[48]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[49]  Neil Henry Latent structure analysis , 1969 .

[50]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[51]  F. Samejima Estimation of latent ability using a response pattern of graded scores , 1968 .

[52]  Alan Agresti,et al.  Examples in which misspecification of a random effects distribution reduces efficiency, and possible remedies , 2004, Comput. Stat. Data Anal..

[53]  Rainer Winkelmann,et al.  Consistent estimation of the fixed effects ordered logit model , 2015, SSRN Electronic Journal.

[54]  F. Krauss Latent Structure Analysis , 1980 .

[55]  G. Molenberghs,et al.  Models for Discrete Longitudinal Data , 2005 .

[56]  R. Hambleton,et al.  Item Response Theory: Principles and Applications , 1984 .

[57]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[58]  J. Hagenaars,et al.  Applied Latent Class Analysis , 2003 .

[59]  S. Pudney,et al.  An Econometric Model of Farm Tenures in Fifteenth-Century Florence , 1998 .

[60]  S. R. Searle,et al.  Generalized, Linear, and Mixed Models , 2005 .

[61]  John Cleland,et al.  Bangladesh Fertility Survey 1989. , 1990 .

[62]  Christian Ritz,et al.  Goodness‐of‐fit Tests for Mixed Models , 2004 .

[63]  J. Ware,et al.  Random-effects models for serial observations with binary response. , 1984, Biometrics.

[64]  Alessandra Salvan,et al.  Modified Profile Likelihood for Fixed-Effects Panel Data Models , 2016 .

[65]  B. Lindsay,et al.  Semiparametric Estimation in the Rasch Model and Related Exponential Response Models, Including a Simple Latent Class Model for Item Analysis , 1991 .

[66]  H. Bozdogan Model selection and Akaike's Information Criterion (AIC): The general theory and its analytical extensions , 1987 .

[67]  B. Muthén,et al.  Deciding on the Number of Classes in Latent Class Analysis and Growth Mixture Modeling: A Monte Carlo Simulation Study , 2007 .

[68]  Francesco Bartolucci,et al.  Longitudinal analysis of self‐reported health status by mixture latent auto‐regressive models , 2014 .

[69]  Jeroen K. Vermunt,et al.  7. Multilevel Latent Class Models , 2003 .

[70]  H. Bozdogan Choosing the Number of Component Clusters in the Mixture-Model Using a New Informational Complexity Criterion of the Inverse-Fisher Information Matrix , 1993 .

[71]  Gary Chamberlain,et al.  Analysis of Covariance with Qualitative Data , 1979 .

[72]  W. Stroup Generalized Linear Mixed Models: Modern Concepts, Methods and Applications , 2012 .

[73]  J. Kiefer,et al.  CONSISTENCY OF THE MAXIMUM LIKELIHOOD ESTIMATOR IN THE PRESENCE OF INFINITELY MANY INCIDENTAL PARAMETERS , 1956 .

[74]  Clifford M. Hurvich,et al.  Regression and time series model selection in small samples , 1989 .

[75]  J. Heckman,et al.  A Method for Minimizing the Impact of Distributional Assumptions in Econometric Models for Duration Data , 1984 .

[76]  W. Vijverberg,et al.  Testing for IIA with the Hausman-Mcfadden Test , 2011, SSRN Electronic Journal.

[77]  Erling B. Andersen,et al.  The Numerical Solution of a Set of Conditional Estimation Equations , 1972 .

[78]  L. Corrado Generalized Latent Variable Modeling: Multilevel, Longitudinal, and Structural Equation Models , 2005 .

[79]  Francesco Bartolucci,et al.  Likelihood inference on the underlying structure of IRT models , 2005 .

[80]  Chih-Chien Yang,et al.  Separating Latent Classes by Information Criteria , 2007, J. Classif..

[81]  Sophia Rabe-Hesketh,et al.  Correcting for covariate measurement error in logistic regression using nonparametric maximum likelihood estimation , 2003 .