Robust causal dependence mining in big data network and its application to traffic flow predictions

Abstract In this paper, we focus on a special problem in transportation studies that concerns the so called “Big Data” challenge, which is: how to build concise yet accurate traffic flow prediction models based on the massive data collected by different sensors ? The size of the data, the hidden causal dependence and the complexity of traffic time series are some of the obstacles that affect making reliable forecast at a reasonable cost, both time-wise and computation-wise. To better prepare the data for traffic modeling, we introduce a multiple-step strategy to process the raw “Big Data” into compact time series that are better suited for regression and causality analysis. First, we use the Granger causality to define and determine the potential dependence among data, and produce a much condensed set of times series who are also highly dependent. Next, we deploy a decomposition algorithm to separate daily-similar trend and nonstationary bursts components from the traffic flow time series yielded by the Granger test. The decomposition results are then treated by two rounds of Lasso regression: the standard Lasso method is first used to quickly filter out most of the irrelevant data, followed by a robust Lasso method to further remove the disturbance caused by bursts components and recover the strongest dependence among the remaining data. Test results show that the proposed method significantly reduces the costs of building prediction models. Moreover, the obtained causal dependence graph reveals the relationship between the structure of road networks and the correlations among traffic time series. All these findings are useful for building better traffic flow prediction models.

[1]  Volker Roth,et al.  The generalized LASSO , 2004, IEEE Transactions on Neural Networks.

[2]  Ashish Bhaskar,et al.  Fusing Loop Detector and Probe Vehicle Data to Estimate Travel Time Statistics on Signalized Urban Networks , 2011, Comput. Aided Civ. Infrastructure Eng..

[3]  C. Granger Testing for causality: a personal viewpoint , 1980 .

[4]  Eleni I. Vlahogianni,et al.  Short‐term traffic forecasting: Overview of objectives and methods , 2004 .

[5]  Yin Wang,et al.  The retrieval of intra-day trend and its influence on traffic prediction , 2012 .

[6]  K. Hlavácková-Schindler,et al.  Causality detection based on information-theoretic approaches in time series analysis , 2007 .

[7]  Yanru Zhang,et al.  A hybrid short-term traffic flow forecasting method based on spectral analysis and statistical volatility model , 2014 .

[8]  Arthur E. Hoerl,et al.  Application of ridge analysis to regression problems , 1962 .

[9]  Yi Zhang,et al.  PPCA-Based Missing Data Imputation for Traffic Flow Volume: A Systematical Approach , 2009, IEEE Transactions on Intelligent Transportation Systems.

[10]  Zhiheng Li,et al.  A Comparison of Detrending Models and Multi-Regime Models for Traffic Flow Prediction , 2014, IEEE Intelligent Transportation Systems Magazine.

[11]  Wei Shen,et al.  Real-time road traffic forecasting using regime-switching space-time models and adaptive LASSO , 2012 .

[12]  Li Li,et al.  Missing traffic data: comparison of imputation methods , 2014 .

[13]  C. Granger Investigating Causal Relations by Econometric Models and Cross-Spectral Methods , 1969 .

[14]  Henry Leung,et al.  Data fusion in intelligent transportation systems: Progress and challenges - A survey , 2011, Inf. Fusion.

[15]  Richard A. Davis,et al.  Time Series: Theory and Methods , 2013 .

[16]  Wan Mansor Wan Mahmood,et al.  Non-linear Granger causality in the currency futures returns , 2000 .

[17]  Gopal K. Kanji,et al.  Performance measurement system , 2002 .

[18]  Shiliang Sun,et al.  Network-Scale Traffic Modeling and Forecasting with Graphical Lasso and Neural Networks , 2012 .

[19]  Xiaosi Zeng,et al.  Development of Recurrent Neural Network Considering Temporal‐Spatial Input Dynamics for Freeway Travel Time Modeling , 2013, Comput. Aided Civ. Infrastructure Eng..

[20]  Biswajit Basu,et al.  Random Process Model for Urban Traffic Flow Using a Wavelet‐Bayesian Hierarchical Technique , 2010, Comput. Aided Civ. Infrastructure Eng..

[21]  Haitham Al-Deek,et al.  Predictions of Freeway Traffic Speeds and Volumes Using Vector Autoregressive Models , 2009, J. Intell. Transp. Syst..

[22]  Roland Chrobok,et al.  Different methods of traffic forecast based on real data , 2004, Eur. J. Oper. Res..

[23]  Yan Liu,et al.  Temporal causal modeling with graphical granger methods , 2007, KDD '07.

[24]  Eleni I. Vlahogianni,et al.  Short-term traffic forecasting: Where we are and where we’re going , 2014 .

[25]  Naoki Abe,et al.  Grouped graphical Granger modeling for gene expression regulatory networks discovery , 2009, Bioinform..

[26]  Billy M. Williams,et al.  Comparison of parametric and nonparametric models for traffic flow forecasting , 2002 .

[27]  S. Geer,et al.  Regularization in statistics , 2006 .

[28]  Steven L. Bressler,et al.  Wiener–Granger Causality: A well established methodology , 2011, NeuroImage.

[29]  Yi Zhang,et al.  Traffic prediction, data compression, abnormal data detection and missing data imputation: An integrated study based on the decomposition of traffic time series , 2014, 17th International IEEE Conference on Intelligent Transportation Systems (ITSC).

[30]  Hojjat Adeli,et al.  Dynamic Wavelet Neural Network Model for Traffic Flow Forecasting , 2005 .

[31]  Serge P. Hoogendoorn,et al.  A Robust and Efficient Method for Fusing Heterogeneous Data from Traffic Sensors on Freeways , 2010, Comput. Aided Civ. Infrastructure Eng..

[32]  Li Li,et al.  Efficient missing data imputing for traffic flow by considering temporal and spatial dependence , 2013 .

[33]  Hannes Koller,et al.  Predicting Motorway Traffic Performance by Data Fusion of Local Sensor Data and Electronic Toll Collection Data , 2011, Comput. Aided Civ. Infrastructure Eng..

[34]  J. Pearl Causal inference in statistics: An overview , 2009 .

[35]  Antony Stathopoulos,et al.  Fuzzy Rule-Based System Approach to Combining Traffic Count Forecasts , 2010 .

[36]  Eleni I. Vlahogianni,et al.  Temporal Evolution of Short‐Term Urban Traffic Flow: A Nonlinear Dynamics Approach , 2008, Comput. Aided Civ. Infrastructure Eng..

[37]  Eleni I. Vlahogianni,et al.  Spatio‐Temporal Short‐Term Urban Traffic Volume Forecasting Using Genetically Optimized Modular Networks , 2007, Comput. Aided Civ. Infrastructure Eng..

[38]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[39]  Fei-Yue Wang,et al.  Data-Driven Intelligent Transportation Systems: A Survey , 2011, IEEE Transactions on Intelligent Transportation Systems.

[40]  Pieter W. Otter,et al.  On Wiener-Granger causality, information and canonical correlation , 1991 .

[41]  P. J. Huber Robust Estimation of a Location Parameter , 1964 .

[42]  Zhong Liu,et al.  Distributed Modeling in a MapReduce Framework for Data-Driven Traffic Flow Forecasting , 2013, IEEE Transactions on Intelligent Transportation Systems.

[43]  Francis R. Bach,et al.  Consistency of the group Lasso and multiple kernel learning , 2007, J. Mach. Learn. Res..

[44]  Ka Chi Lam,et al.  Applying multiple kernel learning and support vector machine for solving the multicriteria and nonlinearity problems of traffic flow prediction , 2014 .

[45]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[46]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[47]  Lester Melie-García,et al.  Estimating brain functional connectivity with sparse multivariate autoregression , 2005, Philosophical Transactions of the Royal Society B: Biological Sciences.

[48]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .