Efficient missing data imputing for traffic flow by considering temporal and spatial dependence

The missing data problem remains as a difficulty in a diverse variety of transportation applications, e.g. traffic flow prediction and traffic pattern recognition. To solve this problem, numerous algorithms had been proposed in the last decade to impute the missed data. However, few existing studies had fully used the traffic flow information of neighboring detecting points to improve imputing performance. In this paper, probabilistic principle component analysis (PPCA) based imputing method, which had been proven to be one of the most effective imputing methods without using temporal or spatial dependence, is extended to utilize the information of multiple points. We systematically examine the potential benefits of multi-point data fusion and study the possible influence of measurement time lags. Tests indicate that the hidden temporal–spatial dependence is nonlinear and could be better retrieved by kernel probabilistic principle component analysis (KPPCA) based method rather than PPCA method. Comparison proves that imputing errors can be notably reduced, if temporal–spatial dependence has been appropriately considered.

[1]  Eleni I. Vlahogianni,et al.  Short‐term traffic forecasting: Overview of objectives and methods , 2004 .

[2]  Guangdong Feng,et al.  A Tensor Based Method for Missing Traffic Data Completion , 2013 .

[3]  Billy M. Williams Multivariate Vehicular Traffic Flow Prediction: Evaluation of ARIMAX Modeling , 2001 .

[4]  Wanli Min,et al.  Real-time road traffic prediction with spatio-temporal correlations , 2011 .

[5]  Hesham Rakha,et al.  Imputing Erroneous Data of Single-Station Loop Detectors for Nonincident Conditions: Comparison Between Temporal and Spatial Methods , 2012, J. Intell. Transp. Syst..

[6]  Li Li,et al.  Robust PCA-based abnormal traffic flow pattern isolation and loop detector fault detection , 2008 .

[7]  Eleni I. Vlahogianni,et al.  Optimized and meta-optimized neural networks for short-term traffic flow prediction: A genetic approach , 2005 .

[8]  Pravin Varaiya,et al.  Measuring Traffic , 2008, 0804.2982.

[9]  Y. Kamarianakis,et al.  Forecasting Traffic Flow Conditions in an Urban Network: Comparison of Multivariate and Univariate Approaches , 2003 .

[10]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[11]  Mascha C. van der Voort,et al.  Combining kohonen maps with arima time series models to forecast traffic flow , 1996 .

[12]  Neil D. Lawrence,et al.  Probabilistic Non-linear Principal Component Analysis with Gaussian Process Latent Variable Models , 2005, J. Mach. Learn. Res..

[13]  Baher Abdulhai,et al.  Distributed maximum likelihood estimation for flow and speed density prediction in distributed traffic detectors with Gaussian mixture model assumption , 2012 .

[14]  Martin Fodslette Møller,et al.  A scaled conjugate gradient algorithm for fast supervised learning , 1993, Neural Networks.

[15]  Matthew G. Karlaftis,et al.  A multivariate state space approach for urban traffic flow modeling and prediction , 2003 .

[16]  Tao Cheng,et al.  Non-parametric regression for space-time forecasting under missing data , 2012, Comput. Environ. Urban Syst..

[17]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[18]  Shiliang Sun,et al.  Network-Scale Traffic Modeling and Forecasting with Graphical Lasso and Neural Networks , 2012, ArXiv.

[19]  Daiheng Ni,et al.  Markov Chain Monte Carlo Multiple Imputation Using Bayesian Networks for Incomplete Intelligent Transportation Systems Data , 2005 .

[20]  Fei-Yue Wang,et al.  Data-Driven Intelligent Transportation Systems: A Survey , 2011, IEEE Transactions on Intelligent Transportation Systems.

[21]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[22]  Yin Wang,et al.  The retrieval of intra-day trend and its influence on traffic prediction , 2012 .

[23]  William T. Scherer,et al.  Exploring Imputation Techniques for Missing Data in Transportation Management Systems , 2003 .

[24]  Shiliang Sun,et al.  A bayesian network approach to traffic flow forecasting , 2006, IEEE Transactions on Intelligent Transportation Systems.

[25]  Carl de Boor,et al.  A Practical Guide to Splines , 1978, Applied Mathematical Sciences.

[26]  Yi Zhang,et al.  PPCA-Based Missing Data Imputation for Traffic Flow Volume: A Systematical Approach , 2009, IEEE Transactions on Intelligent Transportation Systems.

[27]  M. Zhong,et al.  ESTIMATION OF MISSING TRAFFIC COUNTS USING FACTOR, GENETIC, NEURAL AND REGRESSION TECHNIQUES , 2004 .

[28]  David C. Hoyle,et al.  Automatic PCA Dimension Selection for High Dimensional Data and Small Sample Sizes , 2008 .

[29]  Neil D. Lawrence,et al.  Missing Data in Kernel PCA , 2006, ECML.

[30]  Bernhard Schölkopf,et al.  Kernel Principal Component Analysis , 1997, ICANN.

[31]  Shiliang Sun,et al.  Variational Inference for Infinite Mixtures of Gaussian Processes With Applications to Traffic Flow Prediction , 2011, IEEE Transactions on Intelligent Transportation Systems.

[32]  Eleni I. Vlahogianni,et al.  Statistical methods versus neural networks in transportation research: Differences, similarities and some insights , 2011 .

[33]  Alexander Skabardonis,et al.  Detecting Errors and Imputing Missing Data for Single-Loop Surveillance Systems , 2003 .

[34]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[35]  Ming Zhong,et al.  Genetically Designed Models for Accurate Imputation of Missing Traffic Counts , 2004 .

[36]  A. R. Cook,et al.  ANALYSIS OF FREEWAY TRAFFIC TIME-SERIES DATA BY USING BOX-JENKINS TECHNIQUES , 1979 .

[37]  D. Rubin,et al.  Statistical Analysis with Missing Data. , 1989 .

[38]  Michael E. Tipping,et al.  Probabilistic Principal Component Analysis , 1999 .

[39]  Wei Shen,et al.  Real-time road traffic forecasting using regime-switching space-time models and adaptive LASSO , 2012 .

[40]  Zoubin Ghahramani,et al.  A Unifying Review of Linear Gaussian Models , 1999, Neural Computation.

[41]  Li Li,et al.  Comparison on PPCA, KPPCA and MPPCA Based Missing Data Imputing for Traffic Flow , 2013 .

[42]  Antony Stathopoulos,et al.  Methodology for processing archived ITS data for reliability analysis in urban networks , 2006 .

[43]  Shawn Turner,et al.  Archived Intelligent Transportation System Data Quality: Preliminary Analyses of San Antonio TransGuide Data , 2000 .

[44]  Shiliang Sun,et al.  Network-Scale Traffic Modeling and Forecasting with Graphical Lasso and Neural Networks , 2012 .

[45]  Ming Zhong,et al.  Effect of missing values estimations on traffic parameters , 2004 .

[46]  Angshuman Guin,et al.  Multiple Imputation Scheme for Overcoming the Missing Values and Variability Issues in ITS Data , 2005 .

[47]  Zuo Zhang,et al.  Urban traffic network modeling and short-term traffic flow forecasting based on GSTARIMA model , 2010, 13th International IEEE Conference on Intelligent Transportation Systems.

[48]  Tapani Raiko,et al.  Tkk Reports in Information and Computer Science Practical Approaches to Principal Component Analysis in the Presence of Missing Values Tkk Reports in Information and Computer Science Practical Approaches to Principal Component Analysis in the Presence of Missing Values , 2022 .

[49]  Ming Zhong,et al.  Assessing Robustness of Imputation Models Based on Data from Different Jurisdictions: Examples of Alberta and Saskatchewan, Canada , 2005 .

[50]  H. J. Van Zuylen,et al.  Accurate freeway travel time prediction with state-space neural networks under missing data , 2005 .