Smoothing Methods Designed to Minimize the Impact of GPS Random Error on Travel Distance , Speed , and Acceleration Profile Estimates

The Georgia Institute of Technology is currently evaluating the feasibility and effectiveness of mileage-based pricing programs as transportation control measures. The research effort provides incentives to study participants who change driving behavior in response to cent/mile pricing (fixed pricing and pricing as a function of congestion level). To estimate vehicle distance traveled and driver behavior (e.g. speed and acceleration profiles), researchers employ in-vehicle GPS devices. The accuracy of estimated mileage accrual, speeds by road classification, and even acceleration rates used in pricing algorithms is paramount. The researchers have applied various data smoothing techniques to the instrumented vehicle GPS speed data and evaluated the performance of the algorithms in minimizing the impact of GPS random error on the estimation of speed, acceleration, and distance estimates. The researchers also modified the conventional discrete Kalman filter algorithm to enhance its capability of controlling GPS random errors. Each smoothing method produces different second-by-second speed and acceleration profiles (t-test and chisquare tests), except for the Kalman filters. The techniques all provided different travel distance estimates. However, the modified Kalman filter was the most accurate when compared to distance estimates from the onboard vehicle speed sensor (VSS) monitor. The researchers currently recommend that the modified Kalman filter be used as the preferred technique for smoothing GPS data for use in pricing studies. Researchers will continue to evaluate additional smoothing methods as they are identified. Jungwook Jun, Randall Guensler, and Jennifer Ogle. 3 INTRODUCTION Most transportation-related problems, including traffic congestion, crash frequency, energy consumption, and vehicle emissions, are directly related to vehicle usage rates and driver behavior. To encourage drivers to use vehicles more efficiently and to change driving behavior, a number of incentive programs (commute options, transit and rideshare, parking cash-out, congestion pricing, and value pricing of insurance) are being evaluated as potential transportation demand management (TDM) strategies. Among these incentive programs, pay-as-you-drive (PAYD) insurance and variable congestion tolls have been receiving increased attention from planners and transportation policy makers because the program will likely reduce vehicle usage rates and improve driver behavior to achieve safety benefits. Plus, on the average, such pricing programs should provide significant benefits to consumers through reduced insurance premiums. In implementing future programs, tracking of mileage and location of travel will be an important variable (1). As such, future use of GPS data beyond current freight logistics applications is likely to be instrumental to implementation of the most refined pricing programs. The accuracy of estimated mileage accrual, speeds by road classification, and even acceleration rates based upon GPS data becomes paramount. PAYD insurance programs are expected to assess insurance premiums based on travel distance and driving speed. For example, Progressive Causality Insurance Corporation (Progressive) in the U.S. and Norwich Union of England currently use information on travel time, travel distance, or speed in the insurance premium structure (2, 3). The eventual goal of PAYD programs is to evaluate a driver’s potential crash risk and to set premiums that are proportional to such risk including both probabilities coupled with damage functions. Hence, insurance companies and customers need to ensure that reliable data are used in such programs. To collect data on vehicle activity and driver behavior, various data measurement devices such as the distance measurement instrument (DMI), the onboard diagnostics (OBD) system, and the global positioning system (GPS) can be used. Among these devices, the GPS has been the most common choice in transportation research (including PAYD programs), because it provides more useful data, such as travel routes, start and stop points of a trip, travel time, speed, and acceleration rates. Although an accurate data measurement device, as shown in previous studies (4, 5), the GPS is still subject to various systematic and random errors: • Systematic errors may be due to a low number of satellites, a relatively high Position Dilution of Precision (PDOP) value which relates to satellite orientation on the horizon and the impact on position precision, and other parameters (for example, antenna placement) that affect precision and accuracy of the device used (6). • Random errors may result from satellite orbit, clock and receiver issues, atmospheric and ionospheric effects, multi-path signal reflection, and signal blockage (4, 5). While systematic errors can be readily identified and removed, random errors are more difficult to address. Depending upon how the GPS data will be used, and upon the magnitude of the random error effect, it may be necessary to process the GPS data to minimize the effects of random error for some processes in which the data will be employed. Although in smaller research efforts, GPS errors can be identified through visual inspection Jungwook Jun, Randall Guensler, and Jennifer Ogle. 4 of the data, in deployments that yield large GPS data sets, visual inspection is not practical. Due to significant data processing time, automated analysis techniques are required. Statistical smoothing techniques may be useful processing tools since they are designed not only decrease the impact of random errors on the results of the study but also require less time for detecting random errors than visual inspection. Statistical smoothing techniques can be categorized by their statistical backgrounds into three types: the first is to minimize overall error terms, the second is to adjust the probability of occurrence, and the last type is to recursively perform feedback system. Although each approach is capable of detecting random errors in the GPS data profiles, given their different statistical backgrounds, each technique can result in different outputs. Thus, before adopting a specific smoothing technique for identifying random errors in the GPS data profile, researchers need to better understand their characteristics. This study describes the characteristics of three smoothing techniques that are popularly used in a variety of trafficrelated research and also have different statistical algorithms or backgrounds: the least squares spline approximation, the kernel-based smoothing method, and the Kalman filter. • The least squares spline approximation minimizes the residual sum of squared errors (RSS) and has a statistical background similar to regression-based smoothing techniques such as the local polynomial regression, cubic fits, robust exponential smoothing, and time series models • The kernel-based smoothing method adjusts the probability of occurrences in the data stream to modify outliers and has the same statistical background as nearest neighbor smoothing and locally weighted regression models • The Kalman filter, smoothes data points by recursively modifying error values This study evaluates one smoothing method within each general category of smoothing techniques. Each smoothing technique is applied to a large GPS data set collected in Atlanta, GA and then comparatively evaluated for the impact on estimated speeds, accelerations, and travel distance profiles. While not exhaustive, the researchers believe that the three general smoothing approaches examined are representative of each general statistical approach. DATA COLLECTION PROCESS The DRIVE Atlanta Laboratory at the Georgia Institute of Technology (Georgia Tech) developed a wireless data collection system known as the GT Trip Data Collector (GT-TDC). The GT-TDC collects second-by-second vehicle activity data, including vehicle position (latitude and longitude via GPS) and vehicle speed. In addition, the GT-TDC collects ten engine operating parameters from the onboard diagnostics (OBD) system in post-1996 model year vehicles and also monitors vehicle speed at 4Hz from the vehicle speed sensor (VSS) (Thus, the VSS and OBD systems were not installed all vehicles in the commute Atlanta program). The data are integrated into trip files, encrypted, and transmitted to the central server system at Georgia Tech using a wireless data transmit system via a cellular connection. Figure 1 illustrates the appearance of the GT-TDC and its accessories. Jungwook Jun, Randall Guensler, and Jennifer Ogle. 5 FIGURE 1 GT trip data collector. The GT-TDCs were installed in about 500 light-duty vehicles through the commuter choice and value pricing insurance incentive program (Commute Atlanta). To evaluate the filtering techniques, this study employed GPS data gathered between October and November 2004 from 7 vehicles which generated 1,702 trips (1,497,066 data points). Capability of the GPS receiver implemented in the GT-TDC The GT-TDC integrates the 12-channel SiRF Star II GPS receiver, which is designed for incar navigation systems. This receiver was selected for the Commute Atlanta program in 2002 when a previous study conducted by Ogle. et al. (4) found that this GPS receiver provided similar performance for collecting vehicle speed and acceleration as did the DGPS receiver once selected availability (SA) was eliminated in 2000. The SiRF Star II GPS receiver calculates the vehicle location based on C/A code communicated between satellites and the receiver and separately estimates vehicle speed using the Doppler effect (vehicle speed is independent of vehicle location). While Real Time Kinematic (RTK) GPS systems can resolve uncertainty in vehicle location and speed estimates, there are four reasons why researcher team could not implement the PTK-GPS system in the GT-TDC: • RTK-GPS equipment is too costly for use in large deployments (the Commute Atl