Computational Statistics and Data Analysis Model Selection for Zero-inflated Regression with Missing Covariates

Count data are widely existed in the fields of medical trials, public health, surveys and environmental studies. In analyzing count data, it is important to find out whether the zero-inflation exists or not and how to select the most suitable model. However, the classic AIC criterion for model selection is invalid when the observations are missing. In this paper, we develop a new model selection criterion in line with AIC for the zero-inflated regression models with missing covariates. This method is a modified version of Monte Carlo EM algorithm which is based on the data augmentation scheme. One of the main attractions of this new method is that it is applicable for comparison of candidate models regardless of whether there are missing data or not. What is more, it is very simple to compute as it is just a by-product of Monte Carlo EM algorithm when the estimations of parameters are obtained. A simulation study and a real example are used to illustrate the proposed methodologies.

[1]  Sik-Yum Lee,et al.  Maximum Likelihood Analysis of a General Latent Variable Model with Hierarchically Mixed Data , 2004, Biometrics.

[2]  Felix Famoye,et al.  Zero-Inflated Generalized Poisson Regression Model with an Application to Domestic Violence Data , 2021, Journal of Data Science.

[3]  W. Gilks,et al.  Adaptive Rejection Sampling for Gibbs Sampling , 1992 .

[4]  S. Lipsitz,et al.  Missing-Data Methods for Generalized Linear Models , 2005 .

[5]  J G Ibrahim,et al.  Monte Carlo EM for Missing Covariates in Parametric Regression Models , 1999, Biometrics.

[6]  J. Ibrahim,et al.  Semiparametric Models for Missing Covariate and Response Data in Regression Models , 2006, Biometrics.

[7]  B. Clarke,et al.  A Bayesian test for excess zeros in a zero-inflated power series distribution , 2008, 0805.2258.

[8]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[9]  Joseph G Ibrahim,et al.  Bayesian Analysis for Generalized Linear Models with Nonignorably Missing Covariates , 2005, Biometrics.

[10]  C. Czado,et al.  Zero-inflated generalized Poisson models with regression effects on the mean, dispersion and zero-inflation level applied to patent outsourcing rates , 2007 .

[11]  S. Lipsitz,et al.  Missing responses in generalised linear mixed models when the missing data mechanism is nonignorable , 2001 .

[12]  Diane Lambert,et al.  Zero-inflacted Poisson regression, with an application to defects in manufacturing , 1992 .

[13]  Gerda Claeskens,et al.  Variable Selection with Incomplete Covariate Data , 2007, Biometrics.

[14]  Jye-Chyi Lu,et al.  Bayesian analysis of zero-inflated regression models , 2006 .

[15]  J. Ibrahim,et al.  A Semiparametric Mixture Model for Analyzing Clustered Competing Risks Data , 2005, Biometrics.

[16]  Atanu Biswas,et al.  A Bayesian analysis of zero-inflated generalized Poisson model , 2003, Comput. Stat. Data Anal..

[17]  Liming Xiang,et al.  A Score Test for Overdispersion in Zero-inflated Poisson Mixed Regression Model , 2022 .

[18]  Roderick J. A. Little,et al.  Statistical Analysis with Missing Data , 1988 .