Detecting Errors and Imputing Missing Data for Single-Loop Surveillance Systems

Single-loop detectors provide the most abundant source of traffic data in California, but loop data samples are often missing or invalid. A method is described that detects bad data samples and imputes missing or bad samples to form a complete grid of clean data, in real time. The diagnostics algorithm and the imputation algorithm that implement this method are operational on 14,871 loops in six districts of the California Department of Transportation. The diagnostics algorithm detects bad (malfunctioning) single-loop detectors from their volume and occupancy measurements. Its novelty is its use of time series of many samples, instead of basing decisions on single samples, as in previous approaches. The imputation algorithm models the relationship between neighboring loops as linear and uses linear regression to estimate the value of missing or bad samples. This gives a better estimate than previous methods because it uses historical data to learn how pairs of neighboring loops behave. Detection of bad loops and imputation of loop data are important because they allow algorithms that use loop data to perform analysis without requiring them to compensate for missing or incorrect data samples.