Analyzing data with clumping at zero. An example demonstration.

This article demonstrates the use of two approaches to analyzing the relationship of multiple covariates to an outcome which has a high proportion of zero values. One approach is to categorize the continuous outcome (including the zero category) and then fit a proportional odds model. Another approach is to use logistic regression to model the probability of a zero response and ordinary least squares linear regression to model the non-zero continuous responses. The use of these two approaches was demonstrated using outcomes data on hours of care received from the Springfield Elder Project. A crude linear model including both zero and non-zero values was also used for comparison. We conclude that the choice of approaches for analysis depends on the data. If the proportional odds assumption is valid, then it appears to be the method of choice; otherwise, the combination of logistic regression and a linear model is preferable.

[1]  P. McCullagh Regression Models for Ordinal Data , 1980 .

[2]  J. Heckman Shadow prices, market wages, and labor supply , 1974 .

[3]  J. Heckman The Common Structure of Statistical Models of Truncation, Sample Selection and Limited Dependent Variables and a Simple Estimator for Such Models , 1976 .

[4]  S. Tennstedt,et al.  The relative contribution of ethnicity versus socioeconomic status in explaining differences in disability and receipt of informal care. , 1998, The journals of gerontology. Series B, Psychological sciences and social sciences.

[5]  J. Tobin Estimation of Relationships for Limited Dependent Variables , 1958 .

[6]  C. Mcgilchrist,et al.  Threshold models in a methadone programme evaluation. , 1996, Statistics in medicine.

[7]  D A Berry,et al.  Logarithmic transformations in ANOVA. , 1987, Biometrics.

[8]  John A. Nelder,et al.  Models for polytomous data , 2019, Generalized Linear Models.

[9]  P. Green Iteratively reweighted least squares for maximum likelihood estimation , 1984 .

[10]  F. Harrell,et al.  Development of a clinical prediction model for an ordinal outcome: the World Health Organization Multicentre Study of Clinical Signs and Etiological agents of Pneumonia, Sepsis and Meningitis in Young Infants. WHO/ARI Young Infant Multicentre Study Group. , 1998, Statistics in medicine.

[11]  F. Harrell,et al.  Partial Proportional Odds Models for Ordinal Response Variables , 1990 .

[12]  R. Wallace,et al.  The Epidemiologic study of the elderly , 1992 .