Multi-Output Gaussian Processes for Crowdsourced Traffic Data Imputation

Traffic speed data imputation is a fundamental challenge for data-driven transport analysis. In recent years, with the ubiquity of GPS-enabled devices and the widespread use of crowdsourcing alternatives for the collection of traffic data, transportation professionals increasingly look to such user-generated data for a good deal of analysis, planning, and decision support applications. However, due to the mechanics of the data collection process, crowdsourced traffic data such as probe-vehicle data is highly prone to missing observations, making accurate imputation crucial for the success of any application that makes use of that type of data. In this paper, we propose the use of multi-output Gaussian processes (GPs) to model the complex spatial and temporal patterns in crowdsourced traffic data. While the Bayesian nonparametric formalism of GPs allows us to model observation uncertainty, the multi-output extension based on convolution processes effectively enables us to capture complex spatial dependencies between nearby road segments. Using six months of crowdsourced traffic speed data or “probe vehicle data” for several locations in Copenhagen, the proposed approach is empirically shown to significantly outperform popular state-of-the-art imputation methods.

[1]  Ramayya Krishnan,et al.  Adaptive collective routing using gaussian process dynamic congestion models , 2013, KDD.

[2]  Li Li,et al.  Efficient missing data imputing for traffic flow by considering temporal and spatial dependence , 2013 .

[3]  Neil D. Lawrence,et al.  Computationally Efficient Convolved Multiple Output Gaussian Processes , 2011, J. Mach. Learn. Res..

[4]  Gary King,et al.  Amelia II: A Program for Missing Data , 2011 .

[5]  Kristian Henrickson Flexible and Robust Treatments for Missing Traffic Sensor Data , 2014 .

[6]  Ming Zhong,et al.  Genetically Designed Models for Accurate Imputation of Missing Traffic Counts , 2004 .

[7]  Fei-Yue Wang,et al.  An efficient realization of deep learning for traffic data imputation , 2016 .

[8]  Yi Zhang,et al.  PPCA-Based Missing Data Imputation for Traffic Flow Volume: A Systematical Approach , 2009, IEEE Transactions on Intelligent Transportation Systems.

[9]  Ronald P. Barry,et al.  Constructing and fitting models for cokriging and multivariable spatial prediction , 1998 .

[10]  Zhiheng Li,et al.  Improving the Traffic Data Imputation Accuracy Using Temporal and Spatial Information , 2014, 2014 7th International Conference on Intelligent Computation Technology and Automation.

[11]  J. Schafer,et al.  A comparison of inclusive and restrictive strategies in modern missing data procedures. , 2001, Psychological methods.

[12]  Yinhai Wang,et al.  A hybrid approach to integrate fuzzy C-means based imputation method with genetic algorithm for missing traffic volume data estimation , 2015 .

[13]  Ying Sun,et al.  Gaussian Processes for Short-Term Traffic Volume Forecasting , 2010 .

[14]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[15]  Nicole A. Lazar,et al.  Statistical Analysis With Missing Data , 2003, Technometrics.

[16]  Bernardete Ribeiro,et al.  A Bayesian Additive Model for Understanding Public Transport Usage in Special Events , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Zhaobin Liu,et al.  Imputation of Missing Traffic Data during Holiday Periods , 2008 .

[18]  Wei Fan,et al.  Applying Travel Time Reliability Measures in Identifying and Ranking Recurrent Freeway Bottlenecks at the Network Level , 2017 .

[19]  Taek Mu Kwon,et al.  TMC Traffic Data Automation For Mn/DOT's Traffic Monitoring Program , 2004 .

[20]  Xinxin Yu,et al.  Incident Duration Model on Urban Freeways Using Three Different Algorithms of Decision Tree , 2010, 2010 International Conference on Intelligent Computation Technology and Automation.

[21]  Angshuman Guin,et al.  Multiple Imputation Scheme for Overcoming the Missing Values and Variability Issues in ITS Data , 2005 .

[22]  Yunpeng Wang,et al.  Spatiotemporal Recurrent Convolutional Networks for Traffic Prediction in Transportation Networks , 2017, Sensors.

[23]  Marcus R. Frean,et al.  Dependent Gaussian Processes , 2004, NIPS.

[24]  Danya Yao,et al.  Missing data imputation for traffic flow based on improved local least squares , 2012 .

[25]  Yi Zhang,et al.  Trend Modeling for Traffic Time Series Analysis: An Integrated Study , 2015, IEEE Transactions on Intelligent Transportation Systems.

[26]  Alexander Skabardonis,et al.  Detecting Errors and Imputing Missing Data for Single-Loop Surveillance Systems , 2003 .

[27]  Gang Chang,et al.  Comparison of missing data imputation methods for traffic flow , 2011, Proceedings 2011 International Conference on Transportation, Mechanical, and Electrical Engineering (TMEE).

[28]  Alexander Skabardonis,et al.  Validating the Cost-Effectiveness Model for California’s Freeway Incident Management Program , 2015 .

[29]  Chiung-Wen Chang,et al.  A functional data approach to missing value imputation and outlier detection for traffic flow data , 2013 .

[30]  Kristian Kersting,et al.  Stacked Gaussian Process Learning , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[31]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[32]  Hesham Rakha,et al.  Spatiotemporal Traffic State Prediction Based on Discriminatively Pre-trained Deep Neural Networks , 2017 .

[33]  Yang Zhang,et al.  Data Imputation Using Least Squares Support Vector Machines in Urban Arterial Streets , 2009, IEEE Signal Processing Letters.

[34]  et al.,et al.  Missing Data Imputation in the Electronic Health Record Using Deeply Learned Autoencoders , 2017, PSB.

[35]  Fei-Yue Wang,et al.  Data-Driven Intelligent Transportation Systems: A Survey , 2011, IEEE Transactions on Intelligent Transportation Systems.

[36]  Guangdong Feng,et al.  Traffic volume data outlier recovery via tensor model , 2013 .

[37]  Yong Wang,et al.  Learning Traffic as Images: A Deep Convolutional Neural Network for Large-Scale Transportation Network Speed Prediction , 2017, Sensors.

[38]  Tsuyoshi Idé,et al.  Travel-Time Prediction Using Gaussian Process Regression: A Trajectory-Based Approach , 2009, SDM.

[39]  Hwasoo Yeo,et al.  Data-Driven Imputation Method for Traffic Data in Sectional Units of Road Links , 2016, IEEE Transactions on Intelligent Transportation Systems.

[40]  Yajie Zou,et al.  Flexible and Robust Method for Missing Loop Detector Data Imputation , 2015 .

[41]  Morteza Mardani,et al.  Subspace Learning and Imputation for Streaming Big Data Matrices and Tensors , 2014, IEEE Transactions on Signal Processing.

[42]  Li Li,et al.  Comparison on PPCA, KPPCA and MPPCA Based Missing Data Imputing for Traffic Flow , 2013 .

[43]  Yarin Gal,et al.  Uncertainty in Deep Learning , 2016 .

[44]  Jian Zhang,et al.  Using Tensor Completion Method to Achieving Better Coverage of Traffic State Estimation from Sparse Floating Car Data , 2016, PloS one.