Do Larger Sample Sizes Increase the Reliability of Traffic Incident Duration Models? A Case Study of East Tennessee Incidents

Incident duration models are often developed to assist incident management and traveler information dissemination. With recent advances in data collection and management, enormous achieved incident data are now available for incident model development. However, a large volume of data may present challenges to practitioners, such as data processing and computation. Besides, data that span multiple years may have inconsistency issues because of the data collection environments and procedures. A practical question may arise in the incident modeling community—Is that much data really necessary (“all-in”) to build models? If not, then how many data are necessary? To answer these questions, this study aims to investigate the relationship between the data sample sizes and the reliability of incident duration analysis models. This study proposed and demonstrated a sample size determination framework through a case study using data of over 47,000 incidents. This study estimated handfuls of hazard-based duration models with varying sample sizes. The relationships between sample size and model performance, along with estimate outcomes (i.e., coefficients and significance levels), were examined and visualized. The results showed that the variation of estimated coefficients decreases as the sample size increases, and becomes stabilized when the sample size reaches a critical threshold value. This critical threshold value may be the recommended sample size. The case study suggested a sample size of 6,500 to be enough for a reliable incident duration model. The critical value may vary significantly with different data and model specifications. More implications are discussed in the paper.

[1]  Simon Washington,et al.  Hazard based models for freeway traffic incident duration. , 2013, Accident; analysis and prevention.

[2]  Gail M. Sullivan,et al.  Using Effect Size-or Why the P Value Is Not Enough. , 2012, Journal of graduate medical education.

[3]  Ken Kelley,et al.  Sample size for multiple regression: obtaining regression coefficients that are accurate, not simply significant. , 2003, Psychological methods.

[4]  Zuo Zhang,et al.  Time-varying effects of influential factors on incident clearance time using a non-proportional hazard-based model , 2014 .

[5]  Alexander Skabardonis,et al.  Measuring Recurrent and Nonrecurrent Traffic Congestion , 2008 .

[6]  Carolyn Pillers Dobler,et al.  The Practice of Statistics , 2001, Technometrics.

[7]  Ying Lee,et al.  Sequential forecast of incident duration using Artificial Neural Network models. , 2007, Accident; analysis and prevention.

[8]  Asad J. Khattak,et al.  Role of Multiagency Response and On-Scene Times in Large-Scale Traffic Incidents , 2017 .

[9]  G. Giuliano INCIDENT CHARACTERISTICS, FREQUENCY, AND DURATION ON A HIGH VOLUME URBAN FREEWAY , 1989 .

[10]  B. Lantz,et al.  The large sample size fallacy. , 2013, Scandinavian journal of caring sciences.

[11]  J. Hox,et al.  Sufficient Sample Sizes for Multilevel Modeling , 2005 .

[12]  Enivaldo Carvalho da Rocha,et al.  When is statistical significance not significant , 2013 .

[13]  Asad J. Khattak,et al.  Analysis of Large-Scale Incidents on Urban Freeways , 2012 .

[14]  Russell V. Lenth,et al.  Some Practical Guidelines for Effective Sample Size Determination , 2001 .

[15]  Luis Ferreira,et al.  Analysing freeway traffic-incident duration using an Australian data set , 2012 .

[16]  Asad J. Khattak,et al.  Can Data Generated by Connected Vehicles Enhance Safety?: Proactive Approach to Intersection Safety Management , 2017 .

[17]  Simon Washington,et al.  Modelling total duration of traffic incidents including incident detection and recovery time. , 2014, Accident; analysis and prevention.

[18]  April Armstrong,et al.  Traffic Incident Management Handbook , 2010 .

[19]  Han Yan,et al.  Predicting duration of traffic accidents based on cost-sensitive Bayesian network and weighted K-nearest neighbor , 2019, J. Intell. Transp. Syst..

[20]  A. Khattak,et al.  Sequential Prediction for Large-Scale Traffic Incident Duration: Application and Comparison of Survival Models , 2020 .

[21]  Dongjoo Park,et al.  Estimating Incident Duration Considering the Unobserved Heterogeneity of Risk Factors for Trucks Transporting HAZMAT on Expressways , 2019, Transportation Research Record: Journal of the Transportation Research Board.

[22]  Fred L. Mannering,et al.  An exploratory hazard-based analysis of highway incident duration , 2000 .

[23]  A. Dwivedi Handbook of Research on Information Technology Management and Clinical Data Administration in Healthcare , 2009 .

[24]  Kaan Ozbay,et al.  Estimation of incident clearance times using Bayesian Networks approach. , 2006, Accident; analysis and prevention.

[25]  Kaan Ozbay,et al.  INCIDENT MANAGEMENT IN INTELLIGENT TRANSPORTATION SYSTEMS , 1999 .

[26]  Timothy C. Coburn,et al.  Statistical and Econometric Methods for Transportation Data Analysis , 2004, Technometrics.

[27]  R. MacCallum,et al.  Power analysis and determination of sample size for covariance structure modeling. , 1996 .

[28]  Asad J. Khattak,et al.  Modeling Traffic Incident Duration Using Quantile Regression , 2016 .

[29]  Daniel Emaasit,et al.  Impact of Abandoned and Disabled Vehicles on Freeway Incident Durations , 2014 .

[30]  Eugene Demidenko,et al.  Sample size determination for logistic regression revisited , 2006, Statistics in medicine.

[31]  Qiao Shi,et al.  Estimating Freeway Incident Duration Using Accelerated Failure Time Modeling , 2013 .

[32]  J. Christopher Westland,et al.  Lower bounds on sample size in structural equation modeling , 2010, Electron. Commer. Res. Appl..

[33]  Haitham Al-Deek,et al.  Estimating Magnitude and Duration of Incident Delays , 1997 .

[34]  Gary S Collins,et al.  Sample size considerations for the external validation of a multivariable prognostic model: a resampling study , 2015, Statistics in medicine.

[35]  Asad J. Khattak,et al.  Analysis of Cascading Incident Event Durations on Urban Freeways , 2010 .