Modelling Data That Exhibit an Excess Number of Zeros: Zero-Inflated Models and Generic Mixture Models

In biomedical research, data generated as a consequence of the count process can often possess an ‘excess’ of zeros (e.g. geographical incidence rates, hospital death rates). Whilst there are strategies for analysing such data, some can be biased where the underlying data generation process is not carefully considered. This can be exacerbated where the data are also multilevel, since hierarchical extensions to zero-inflated model strategies do not always satisfy underlying model assumptions. We therefore review zero-inflated modelling strategies for single-level data and show why standard Poisson and binomial zero-inflated models (i.e. where one latent class has a central location of zero) require class membership to be predicted by covariates in the standard regression part of the model. We also introduce generic mixture models and reveal limitations in their interpretation in a number of circumstances. With nested or hierarchical count data with an excess of zeros, upper-level distributional assumptions may not be upheld for standard multilevel models, thereby requiring alternative strategies; in Chap. 7 we introduce and illustrate the semi-parametric multilevel model as a solution to this problem.

[1]  E. Parner,et al.  Surface-specific caries incidence in permanent molars in Danish children. , 2007, European journal of oral sciences.

[2]  A. Groeneveld Longitudinal study of prevalence of enamel lesions in a fluoridated and non-fluoridated area. , 1985, Community dentistry and oral epidemiology.

[3]  M. Vaeth,et al.  Lorenz curves and their use in describing the distribution of 'the total burden' of dental caries in a population. , 2001, Community dental health.

[4]  D. Altman,et al.  Measuring agreement in method comparison studies , 1999, Statistical methods in medical research.

[5]  J. Vermunt,et al.  Latent Gold 4.0 User's Guide , 2005 .

[6]  E. Lesaffre,et al.  Multivariate survival analysis for the identification of factors associated with cavity formation in permanent first molars. , 2005, European journal of oral sciences.

[7]  Dankmar Böhning,et al.  The zero‐inflated Poisson model and the decayed, missing and filled teeth index in dental epidemiology , 1999 .

[8]  Richard D. Gill,et al.  Multivariate Survival Analysis , 1993 .

[9]  Sophia Rabe-Hesketh,et al.  Generalized latent variable models: multilevel, longitudinal, and structural equation models , 2004 .

[10]  S. Poulsen,et al.  An evaluation of a hierarchical method of describing the pattern of dental caries attack. , 1974, Community dentistry and oral epidemiology.

[11]  Stephen Senn,et al.  Change from baseline and analysis of covariance revisited , 2006, Statistics in medicine.

[12]  F. Lord A paradox in the interpretation of group comparisons. , 1967, Psychological bulletin.

[13]  D. Holst The relationship between prevalence and incidence of dental caries. Some observational consequences. , 2006, Community dental health.

[14]  John Hinde,et al.  Models for count data with many zeros , 1998 .

[15]  Dankmar Böhning,et al.  Zero‐Inflated Poisson Models and C.A.MAN: A Tutorial Collection of Evidence , 1998 .

[16]  J. Mullahy Specification and testing of some modified count data models , 1986 .

[17]  E. Schwarz,et al.  Patterns of dental caries severity in Chinese kindergarten children. , 1997, Community dentistry and oral epidemiology.

[18]  John Hinde,et al.  Zero-inflated proportion data models applied to a biological control assay , 2000 .

[19]  A. Gittelsohn,et al.  Longitudinal studies of the natural history of caries. II. A life-table study of caries incidence in the permanent teeth. , 1965, Archives of oral biology.

[20]  L. Kinlen,et al.  Evidence from population mixing in British New Towns 1946-85 of an infective basis for childhood leukaemia , 1990, The Lancet.

[21]  D. Hall Zero‐Inflated Poisson and Binomial Regression with Random Effects: A Case Study , 2000, Biometrics.

[22]  S. Zeger,et al.  Longitudinal data analysis using generalized linear models , 1986 .

[23]  Frederic M. Lord,et al.  Statistical adjustments when comparing preexisting groups. , 1969 .

[24]  M. Gilthorpe,et al.  Statistical issues on the analysis of change in follow-up studies in dental research. , 2007, Community dentistry and oral epidemiology.

[25]  Diane Lambert,et al.  Zero-inflacted Poisson regression, with an application to defects in manufacturing , 1992 .

[26]  D. M. Malvitz,et al.  Updated comparison of the caries susceptibility of various morphological types of permanent teeth. , 2003, Journal of public health dentistry.