On performance of parametric and distribution‐free models for zero‐inflated and over‐dispersed count responses

Zero-inflated Poisson (ZIP) and negative binomial (ZINB) models are widely used to model zero-inflated count responses. These models extend the Poisson and negative binomial (NB) to address excessive zeros in the count response. By adding a degenerate distribution centered at 0 and interpreting it as describing a non-risk group in the population, the ZIP (ZINB) models a two-component population mixture. As in applications of Poisson and NB, the key difference between ZIP and ZINB is the allowance for overdispersion by the ZINB in its NB component in modeling the count response for the at-risk group. Overdispersion arising in practice too often does not follow the NB, and applications of ZINB to such data yield invalid inference. If sources of overdispersion are known, other parametric models may be used to directly model the overdispersion. Such models too are subject to assumed distributions. Further, this approach may not be applicable if information about the sources of overdispersion is unavailable. In this paper, we propose a distribution-free alternative and compare its performance with these popular parametric models as well as a moment-based approach proposed by Yu et al. [Statistics in Medicine 2013; 32: 2390-2405]. Like the generalized estimating equations, the proposed approach requires no elaborate distribution assumptions. Compared with the approach of Yu et al., it is more robust to overdispersed zero-inflated responses. We illustrate our approach with both simulated and real study data.

[1]  J. Lawless,et al.  Tests for Detecting Overdispersion in Poisson Regression Models , 1989 .

[2]  Q. Vuong Likelihood Ratio Tests for Model Selection and Non-Nested Hypotheses , 1989 .

[3]  D. Hall Zero‐Inflated Poisson and Binomial Regression with Random Effects: A Case Study , 2000, Biometrics.

[4]  S. Miaou The relationship between truck accidents and geometric design of road sections: Poisson versus negative binomial regressions. , 1994, Accident; analysis and prevention.

[5]  Bruno Crépon,et al.  Research and development, competition and innovation pseudo-maximum likelihood and simulated maximum likelihood methods applied to count data models with heterogeneity☆ , 1997 .

[6]  Jingming Ma,et al.  Modeling Count Outcomes from HIV Risk Reduction Interventions: A Comparison of Competing Statistical Models for Count Responses , 2012, AIDS research and treatment.

[7]  Y. Cheung,et al.  Zero‐inflated models for regression analysis of count data: a study of growth and development , 2002, Statistics in medicine.

[8]  A. Cameron,et al.  Econometric models based on count data. Comparisons and applications of some estimators and tests , 1986 .

[9]  David C. Heilbron,et al.  Zero-Altered and other Regression Models for Count Data with Added Zeros , 1994 .

[10]  Sujit K. Ghosh,et al.  Semiparametric inference based on a class of zero-altered distributions , 2007 .

[11]  Pravin K. Trivedi,et al.  Regression Analysis of Count Data: Preface , 1998 .

[12]  Peter A Lachenbruch,et al.  Analysis of data with excess zeros , 2002, Statistical methods in medical research.

[13]  淳 川口,et al.  Applied Categorical and Count Data Analysis , 2015 .

[14]  Donald Hedeker,et al.  Modeling Clustered Count Data with Excess Zeros in Health Care Outcomes Research , 2002, Health Services & Outcomes Research Methodology.

[15]  L. Zhao,et al.  Estimating equations for parameters in means and covariances of multivariate discrete and continuous responses. , 1991, Biometrics.

[16]  X. Tu,et al.  A new look at the difference between the GEE and the GLMM when modeling longitudinal count responses , 2012 .

[17]  Diane Lambert,et al.  Zero-inflacted Poisson regression, with an application to defects in manufacturing , 1992 .

[18]  J van den Broek,et al.  A score test for zero inflation in a Poisson distribution. , 1995, Biometrics.

[19]  X M Tu,et al.  Distribution‐free models for longitudinal count responses with overdispersion and structural zeros , 2013, Statistics in medicine.

[20]  D. Lindenmayer,et al.  Modelling the abundance of rare species: statistical models for counts with extra zeros , 1996 .

[21]  P. McCullagh,et al.  Generalized Linear Models, 2nd Edn. , 1990 .

[22]  X M Tu,et al.  Causal inference for Mann–Whitney–Wilcoxon rank sum and other nonparametric statistics , 2014, Statistics in medicine.

[23]  P Wu,et al.  A Class of Distribution-Free Models for Longitudinal Mediation Analysis , 2014, Psychometrika.

[24]  Teh-Wei Hu,et al.  The effect of a major cigarette price change on smoking behavior in california: a zero-inflated negative binomial model. , 2004, Health economics.

[25]  Hui Zhang,et al.  Modeling longitudinal binomial responses: implications from two dueling paradigms , 2011 .

[26]  Pravin K. Trivedi,et al.  Excess Zeros in Count Models for Recreational Trips , 1996 .

[27]  S. Zeger,et al.  Multivariate Regression Analyses for Categorical Data , 1992 .

[28]  J. Harrer,et al.  Motivational and skills training HIV/sexually transmitted infection sexual risk reduction groups for men. , 2009, Journal of substance abuse treatment.

[29]  R. Gallop,et al.  Predictors and moderators of outcomes of HIV/STD sex risk reduction interventions in substance abuse treatment programs: a pooled analysis of two randomized controlled trials , 2014, Substance Abuse Treatment, Prevention, and Policy.