Statistical challenges with modeling motor vehicle crashes: Understanding the implications of alternative approaches

There has been considerable research conducted over the last 20 years focused on predicting motor vehicle crashes on transportation facilities. The range of statistical models commonly applied includes binomial, Poisson, Poisson-gamma (or Negative Binomial), Zero-Inflated Poisson and Negative Binomial Models (ZIP and ZINB), and Multinomial probability models. Given the range of possible modeling approaches and the host of assumptions with each modeling approach, making an intelligent choice for modeling motor vehicle crash data is difficult at best. There is little discussion in the literature comparing different statistical modeling approaches, identifying which statistical models are most appropriate for modeling crash data, and providing a strong justification from basic crash principles. In recent years, for example, it has been suggested that the motor vehicle crash process can successfully be modeled by assuming a dual-state data generating process, which implies that entities (e.g., intersections, road segments, pedestrian crossings, etc.) exist in one of two states—perfectly safe and unsafe. As a result the ZIP and ZINB are two models that have been applied to account for the preponderance of “excess” zeros frequently observed in crash count data. The objective of this study is to provide defensible guidance on how to appropriate model crash data. We first examine the motor vehicle crash process using theoretical principles and a basic understanding of the crash process. It is shown that the fundamental crash process follows a Bernoulli trial with unequal probability of independent events, also known as Poisson trials. We examine the evolution of statistical models as they apply to the motor vehicle crash process, and indicate how well they statistically approximate the crash process. We also present the theory behind dual-state process count models, and note why they have become popular for modeling crash data. A simulation experiment is then conducted to demonstrate how crash data give rise to “excess” zeroes frequently observed in crash data. It is shown that the Poisson and other mixed probabilistic structures are approximations assumed for modeling the motor vehicle crash process. Furthermore, it is demonstrated that under certain (fairly common) circumstances excess zeroes are observed—and that these circumstances arise from low exposure and/or inappropriate selection of time/space scales and not an underlying dual state process. In conclusion, carefully selecting the time/space scales for analysis, including an improved set of explanatory variables and/or unobserved heterogeneity effects in count regression models, or applying small area statistical methods (observations with low exposure) represent the most defensible modeling approaches for datasets with a preponderance of zeros.

[1]  Jerry Nedelman,et al.  Bernoulli Trials, Poisson Trials, Surprising Variances, and Jensen's Inequality , 1986 .

[2]  J. Mullahy Specification and testing of some modified count data models , 1986 .

[3]  Semiparametric and nonparametric econometrics , 1988 .

[4]  P. W. Wilson,et al.  Analyzing Frequencies of Several Types of Events: A Mixed Multinomial-Poisson Approach , 1990 .

[5]  Siddhartha R. Dalal,et al.  Empirical bayes prediction for a compound poisson-multinomial process , 1990 .

[6]  L. R. Taylor,et al.  Aggregation, Variance and the Mean , 1961, Nature.

[7]  Bradley P. Carlin,et al.  Generalized Linear Models for Small-Area Estimation , 1998 .

[8]  J. Rao Small Area Estimation , 2003 .

[9]  John Lawrence. Stedl Estimating the parameters of the Weibull distribution , 1967 .

[10]  Lianfen Qian,et al.  Nonparametric Curve Estimation: Methods, Theory, and Applications , 1999, Technometrics.

[11]  Bhagwant Persaud,et al.  ACCIDENT PREDICTION MODELS FOR FREEWAYS , 1993 .

[12]  Simon Washington,et al.  Validation of FHWA Crash Models for Rural Intersections: Lessons Learned , 2003 .

[13]  Abraham P. Punnen,et al.  On the nature of the binomial distribution , 2001, Journal of Applied Probability.

[14]  T. Louis,et al.  Bayes and Empirical Bayes Methods for Data Analysis. , 1997 .

[15]  Bruce N. Janson,et al.  Diagnostic Methodology for the Detection of Safety Problems at Intersections , 2002 .

[16]  Ingram Olkin,et al.  Probability Models and Applications , 2019 .

[17]  Christian Gourieroux,et al.  A count data model with unobserved heterogeneity , 1997 .

[18]  F Mannering,et al.  Modeling accident frequencies as zero-altered probability processes: an empirical inquiry. , 1997, Accident; analysis and prevention.

[19]  P. Schmidt,et al.  Predicting Criminal Recidivism Using "Split Population" Survival Time Models , 1987 .

[20]  A. Barbour,et al.  Poisson Approximation , 1992 .

[21]  D F Jarrett,et al.  Accidents at blackspots: Estimating the effectiveness of remedial treatment with special reference to the 'Regression-to-Mean' effect , 1981 .

[22]  W. Greene,et al.  Accounting for Excess Zeros and Sample Selection in Poisson and Negative Binomial Regression Models , 1994 .

[23]  M. Wand Local Regression and Likelihood , 2001 .

[24]  Venkataraman N. Shankar,et al.  Modeling crashes involving pedestrians and motorized traffic , 2003 .

[25]  Bhagwant Persaud,et al.  Accident Prediction Models With and Without Trend: Application of the Generalized Estimating Equations Procedure , 2000 .

[26]  李幼升,et al.  Ph , 1989 .

[27]  K. Land,et al.  A Comparison of Poisson, Negative Binomial, and Semiparametric Mixed Poisson Regression Models , 1996 .

[28]  Ezra Hauer,et al.  OBSERVATIONAL BEFORE-AFTER STUDIES IN ROAD SAFETY -- ESTIMATING THE EFFECT OF HIGHWAY AND TRAFFIC ENGINEERING MEASURES ON ROAD SAFETY , 1997 .

[29]  Jye-Chyi Lu,et al.  Multivariate zero-inflated Poisson models and their applications , 1999 .

[30]  K. Poortema,et al.  On modelling overdispersion of counts , 1999 .

[31]  H C Chin,et al.  Modeling Accident Occurrence at Signalized Tee Intersections with Special Emphasis on Excess Zeros , 2003, Traffic injury prevention.

[32]  Hoong Chor Chin,et al.  Study of Intersection Accidents by Maneuver Type , 2002 .

[33]  J L Martin,et al.  Comparison of road crashes incidence and severity between some French counties. , 2003, Accident; analysis and prevention.

[34]  Ezra Hauer,et al.  Estimation of safety at signalized intersections , 1988 .

[35]  Christopher Zorn,et al.  EVALUATING ZERO-INFLATED AND HURDLE POISSON SPECIFICATIONS , 1996 .

[36]  Fred Mannering,et al.  Impact of roadside features on the frequency and severity of run-off-roadway accidents: an empirical analysis. , 2002, Accident; analysis and prevention.

[37]  S. Baker The Multinomial‐Poisson Transformation , 1994 .

[38]  Dominique Lord,et al.  Traffic Safety Diagnostic and Application of Countermeasures for Rural Roads in Burkina Faso , 2002 .

[39]  E Hauer,et al.  SAFETY ANALYSIS OF ROADWAY GEOMETRIC AND ANCILLARY FEATURES , 1996 .

[40]  David C. Heilbron,et al.  Zero-Altered and other Regression Models for Count Data with Added Zeros , 1994 .

[41]  S. Washington,et al.  Statistical and Econometric Methods for Transportation Data Analysis , 2010 .

[42]  Shaw-Pin Miaou,et al.  Modeling Traffic Crash-Flow Relationships for Intersections: Dispersion Parameter, Functional Form, and Bayes Versus Empirical Bayes Methods , 2003 .

[43]  J. Ivan,et al.  Explaining two-lane highway crash rates using land use and hourly exposure. , 2000, Accident; analysis and prevention.

[44]  Nalini Ravishanker,et al.  Selecting exposure measures in crash rate prediction for two-lane highway segments. , 2004, Accident; analysis and prevention.

[45]  Dominique Lord,et al.  The prediction of accidents on digital networks, characteristics and issues related to the application of accident prediction models , 2000 .

[46]  William Feller,et al.  An Introduction to Probability Theory and Its Applications , 1967 .

[47]  Guohua Pan,et al.  Local Regression and Likelihood , 1999, Technometrics.

[48]  Q. Vuong Likelihood Ratio Tests for Model Selection and Non-Nested Hypotheses , 1989 .

[49]  Samuel Kotz,et al.  Discrete Distributions: Distributions in Statistics , 1971 .

[50]  Fred L. Mannering,et al.  Negative binomial analysis of intersection accident frequencies , 1996 .

[51]  Zvi Drezner,et al.  A generalized binomial distribution , 1993 .

[52]  Pravin K. Trivedi,et al.  Regression Analysis of Count Data: Preface , 1998 .

[53]  R Kulmala,et al.  SAFETY AT RURAL THREE- AND FOUR-ARM JUNCTIONS. DEVELOPMENT AND APPLICATION OF ACCIDENT PREDICTION MODELS. , 1995 .

[54]  Keizo Yoneda Estimations in some modified Poisson distributions. , 1962 .

[55]  J. G. Cragg Some Statistical Models for Limited Dependent Variables with Application to the Demand for Durable Goods , 1971 .

[56]  S. Miaou The relationship between truck accidents and geometric design of road sections: Poisson versus negative binomial regressions. , 1994, Accident; analysis and prevention.

[57]  A. Rukhin Bayes and Empirical Bayes Methods for Data Analysis , 1997 .

[58]  Simon Washington,et al.  Empirical Investigation of Interactive Highway Safety Design Model Accident Prediction Algorithm: Rural Intersections , 2003 .

[59]  Diane Lambert,et al.  Zero-inflacted Poisson regression, with an application to defects in manufacturing , 1992 .

[60]  W. Feller,et al.  An Introduction to Probability Theory and Its Application. , 1951 .

[61]  E. John Russell Rothamsted Experimental Station , 1944 .

[62]  E Hauer,et al.  Overdispersion in modelling accidents on road sections and in empirical bayes estimation. , 2001, Accident; analysis and prevention.