The Alabama Department of Public Safety has developed and maintains a centralized database that contains traffic accident data collected from crash reports completed by local police officers and state troopers. The Critical Analysis Reporting Environment (CARE), developed by Dr. David Brown and the Computer Science Department of the University of Alabama, provides web-based access to this database along with some basic statistical summary capabilities. In their research project, they employed existing multivariate data exploration tools to explore these databases for interesting and useful information that might lead to improved highway safety. Their analysis of the data led to the discovery of numerous data entry and variable definition problems in the Alabama Accident Databases. In this report, they describe these problems and make recommendations for the improvement of future data collection and the CARE system. Ultimately, these data quality issues lead them to conclude that meaningful statistical modeling of the existing data for the prediction of injuries and fatalities in traffic accidents is not feasible until these problems are corrected.
[1] D. Hosmer,et al. Applied Logistic Regression , 1991 .
[2] Leo Breiman,et al. Classification and Regression Trees , 1984 .
[3] M. Friendly. Mosaic Displays for Multi-Way Contingency Tables , 1994 .
[4] Michael Friendly,et al. Extending Mosaic Displays: Marginal, Conditional, and Partial Views of Categorical Data , 1999 .