Missing traffic data: comparison of imputation methods

Many traffic management and control applications require highly complete and accurate data of traffic flow. However, because of various reasons such as sensor failure or transmission error, it is common that some traffic flow data are lost. As a result, various methods were proposed by using a wide spectrum of techniques to estimate missing traffic data in the last two decades. Generally, these missing data imputation methods can be categorised into three kinds: prediction methods, interpolation methods and statistical learning methods. To assess their performance, these methods are compared from different aspects in this paper, including reconstruction errors, statistical behaviours and running speeds. Results show that statistical learning methods are more effective than the other two kinds of imputation methods when data of a single detector is utilised. Among various methods, the probabilistic principal component analysis (PPCA) yields best performance in all aspects. Numerical tests demonstrate that PPCA can be used to impute data online before making further analysis (e.g. make traffic prediction) and is robust to weather changes.

[1]  K. Pearson Contributions to the Mathematical Theory of Evolution , 1894 .

[2]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[3]  W. Wong,et al.  The calculation of posterior distributions by data augmentation , 1987 .

[4]  Mark Dougherty,et al.  A REVIEW OF NEURAL NETWORKS APPLIED TO TRANSPORT , 1995 .

[5]  Christopher M. Bishop,et al.  Mixtures of Probabilistic Principal Component Analyzers , 1999, Neural Computation.

[6]  Daniel B. Fambro,et al.  Application of Subset Autoregressive Integrated Moving Average Model for Short-Term Freeway Traffic Volume Forecasting , 1999 .

[7]  Shawn Turner,et al.  Archived Intelligent Transportation System Data Quality: Preliminary Analyses of San Antonio TransGuide Data , 2000 .

[8]  Geoffrey E. Hinton,et al.  Split and Merge EM Algorithm for Improving Gaussian Mixture Density Estimates , 2000, J. VLSI Signal Process..

[9]  Hussein Dia,et al.  An object-oriented neural network approach to short-term traffic forecasting , 2001, Eur. J. Oper. Res..

[10]  Russ B. Altman,et al.  Missing value estimation methods for DNA microarrays , 2001, Bioinform..

[11]  Alexander Skabardonis,et al.  Detecting Errors and Imputing Missing Data for Single-Loop Surveillance Systems , 2003 .

[12]  Shiliang Sun,et al.  A Bayesian network approach to time series forecasting of short-term traffic flows , 2004, Proceedings. The 7th International IEEE Conference on Intelligent Transportation Systems (IEEE Cat. No.04TH8749).

[13]  Ming Zhong,et al.  Genetically Designed Models for Accurate Imputation of Missing Traffic Counts , 2004 .

[14]  Daiheng Ni,et al.  Markov Chain Monte Carlo Multiple Imputation Using Bayesian Networks for Incomplete Intelligent Transportation Systems Data , 2005 .

[15]  Eleni I. Vlahogianni,et al.  Optimized and meta-optimized neural networks for short-term traffic flow prediction: A genetic approach , 2005 .

[16]  Ming Zhong,et al.  Assessing Robustness of Imputation Models Based on Data from Different Jurisdictions: Examples of Alberta and Saskatchewan, Canada , 2005 .

[17]  Gene H. Golub,et al.  Missing value estimation for DNA microarray gene expression data: local least squares imputation , 2005, Bioinform..

[18]  Yi Zhang,et al.  Simultaneously Prediction of Network Traffic Flow Based on PCA-SVR , 2007, ISNN.

[19]  B. Basu,et al.  Bayesian Time-Series Model for Short-Term Traffic Flow Forecasting , 2007 .

[20]  Zhaobin Liu,et al.  Imputation of Missing Traffic Data during Holiday Periods , 2008 .

[21]  Yi Zhang,et al.  PPCA-Based Missing Data Imputation for Traffic Flow Volume: A Systematical Approach , 2009, IEEE Transactions on Intelligent Transportation Systems.

[22]  Lee D. Han,et al.  Online-SVR for short-term traffic flow prediction under typical and atypical traffic conditions , 2009, Expert Syst. Appl..

[23]  Hesham Rakha,et al.  Imputing Erroneous Data of Single-Station Loop Detectors for Nonincident Conditions: Comparison Between Temporal and Spatial Methods , 2012, J. Intell. Transp. Syst..

[24]  Yin Wang,et al.  The retrieval of intra-day trend and its influence on traffic prediction , 2012 .