The Analysis of Count Data: A Gentle Introduction to Poisson Regression and Its Alternatives

Count data reflect the number of occurrences of a behavior in a fixed period of time (e.g., number of aggressive acts by children during a playground period). In cases in which the outcome variable is a count with a low arithmetic mean (typically < 10), standard ordinary least squares regression may produce biased results. We provide an introduction to regression models that provide appropriate analyses for count data. We introduce standard Poisson regression with an example and discuss its interpretation. Two variants of Poisson regression, overdispersed Poisson regression and negative binomial regression, are introduced that may provide more optimal results when a key assumption of standard Poisson regression is violated. We also discuss the problems of excess zeros in which a subgroup of respondents who would never display the behavior are included in the sample and truncated zeros in which respondents who have a zero count are excluded by the sampling plan. We provide computer syntax for our illustrations in SAS and SPSS. The Poisson family of regression models provides improved and now easy to implement analyses of count data. [Supplementary materials are available for this article. Go to the publisher's online edition of Journal of Personality Assessment for the following free supplemental resources: the data set used to illustrate Poisson regression in this article, which is available in three formats—a text file, an SPSS database, or a SAS database.]

[1]  S. Armeli,et al.  Drinking to regulate negative romantic relationship interactions: The moderating role of self-esteem☆ , 2008 .

[2]  J. Fox,et al.  Applied Regression Analysis and Generalized Linear Models , 2008 .

[3]  John Fox,et al.  Applied Regression Analysis and Generalized Linear Models , 2008 .

[4]  G. Walters Predicting Institutional Adjustment With the Lifestyle Criminality Screening Form and the Antisocial Features and Aggression Scales of the PAI , 2007, Journal of personality assessment.

[5]  J. Hilbe Negative Binomial Regression: Preface , 2007 .

[6]  S. Armeli,et al.  Daily evaluation of anticipated outcomes from alcohol use among college students , 2005 .

[7]  N. Horton,et al.  Relationship of depressive symptoms and mental health functioning to repeat detoxification. , 2005, Journal of substance abuse treatment.

[8]  R. Dodhia A Review of Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences (3rd ed.) , 2005 .

[9]  G. Walters Predicting Institutional Adjustment with the Lifestyle Criminality Screening Form and Psychological Inventory of Criminal Thinking Styles , 2005 .

[10]  B. Kable Mental health. , 2005, Australian family physician.

[11]  Daniel B. Hall,et al.  Marginal models for zero inflated clustered data , 2004 .

[12]  J. Lang An Introduction to Generalized Linear Models , 2003 .

[13]  Hoong Chor Chin,et al.  Modeling Count Data with Excess Zeroes , 2003 .

[14]  Robert C MacCallum,et al.  2001 Presidential Address: Working with Imperfect Models , 2003, Multivariate behavioral research.

[15]  Eric R. Ziegel,et al.  An Introduction to Generalized Linear Models , 2002, Technometrics.

[16]  Eric R. Ziegel,et al.  Generalized Linear Models , 2002, Technometrics.

[17]  J. Worrall If You Build It, They Will Come: Consequences of Improved Citizen Complaint Review Procedures , 2002 .

[18]  J. T. Wulu,et al.  Regression analysis of count data , 2002 .

[19]  H. Wainer,et al.  The Centercept: An Estimable and Meaningful Regression Parameter , 2000, Psychological science.

[20]  O. Siddiqui,et al.  The application of Poisson random-effects regression models to the analyses of adolescents' current level of smoking. , 1999, Preventive medicine.

[21]  Paul D. Allison,et al.  Logistic regression using sas®: theory and application , 1999 .

[22]  J. A. Calvin Regression Models for Categorical and Limited Dependent Variables , 1998 .

[23]  Pravin K. Trivedi,et al.  Regression Analysis of Count Data , 1998 .

[24]  F. Windmeijer,et al.  An R-squared measure of goodness of fit for some common nonlinear regression models , 1997 .

[25]  K. Land,et al.  A Comparison of Poisson, Negative Binomial, and Semiparametric Mixed Poisson Regression Models , 1996 .

[26]  S. West,et al.  Experimental personality designs: analyzing categorical by continuous variable interactions. , 1996, Journal of personality.

[27]  E. Mulvey,et al.  Regression analyses of counts and rates: Poisson, overdispersed Poisson, and negative binomial models. , 1995, Psychological bulletin.

[28]  A. Raftery Bayesian Model Selection in Social Research , 1995 .

[29]  L. Fahrmeir,et al.  Multivariate statistical modelling based on generalized linear models , 1994 .

[30]  W. Greene,et al.  Accounting for Excess Zeros and Sample Selection in Poisson and Negative Binomial Regression Models , 1994 .

[31]  Larry E. Toothaker,et al.  Multiple Regression: Testing and Interpreting Interactions , 1991 .

[32]  N. Nagelkerke,et al.  A note on a general definition of the coefficient of determination , 1991 .

[33]  Eric R. Ziegel,et al.  Analysis of Binary Data (2nd ed.) , 1991 .

[34]  D. Cox,et al.  Analysis of Binary Data (2nd ed.). , 1990 .

[35]  P. McCullagh,et al.  Generalized Linear Models, 2nd Edn. , 1990 .

[36]  P. Albert,et al.  Models for longitudinal data: a generalized estimating equation approach. , 1988, Biometrics.

[37]  D. A. Williams,et al.  Generalized Linear Model Diagnostics Using the Deviance and Single Case Deletions , 1987 .

[38]  D. Pierce,et al.  Residuals in Generalized Linear Models , 1986 .

[39]  S. Weisberg,et al.  Diagnostics for heteroscedasticity in regression , 1983 .

[40]  Stanley L. Sclove Mathematical Statistics (3rd ed.) , 1981 .

[41]  Jacob Cohen,et al.  Applied multiple regression/correlation analysis for the behavioral sciences , 1979 .

[42]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[43]  D. Cox,et al.  The analysis of binary data , 1971 .

[44]  H. Chernoff On the Distribution of the Likelihood Ratio , 1954 .