Efficient traffic speed forecasting based on massive heterogenous historical data

Drivers dream of foreseeing traffic condition to enjoy efficient driving experience at all times. Given the historical patterns for different locations and different time, people should be able to guess the possible traffic speed in a near future moment. What is difficult and interesting for this task is that we need to filter the useful data that could help us for the next moment traffic speed prediction from a massive amount of historical data. On the other hand, the traffic condition could be highly dynamic and we can only give a reliable traffic prediction by using the most updated model for prediction. This implies that frequent retraining is necessary. To conquer the task, we propose a lazy learning approach for traffic speed prediction given massive historical data. The approach integrates the kNN and Gaussian process regression for efficient and robust traffic speed prediction. kNN can help us to select the most informative data for Gaussian process Regression using a big data framework. Thanks for the most recent progress of big data research, the processing of massive data for prediction in close to real time has become possible now compared to any time in the past. We aim at using a Hadoop framework for the prediction given heterogeneous data including traffic data such as speed, flow, occupancy, and weather data.

[1]  Andreas Hegyi,et al.  Freeway traffic estimation within particle filtering framework , 2007, Autom..

[2]  Fei-Yue Wang,et al.  Data-Driven Intelligent Transportation Systems: A Survey , 2011, IEEE Transactions on Intelligent Transportation Systems.

[3]  Markos Papageorgiou,et al.  Real-time freeway traffic state estimation based on extended Kalman filter: a general approach , 2005 .

[4]  George E. P. Box,et al.  Time Series Analysis: Box/Time Series Analysis , 2008 .

[5]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[6]  Zhong Liu,et al.  Distributed Modeling in a MapReduce Framework for Data-Driven Traffic Flow Forecasting , 2013, IEEE Transactions on Intelligent Transportation Systems.

[7]  Kun Zhou,et al.  Real-time KD-tree construction on graphics hardware , 2008, SIGGRAPH Asia '08.

[8]  Alexander J. Smola,et al.  Support Vector Regression Machines , 1996, NIPS.

[9]  Pavel Zezula,et al.  A Scalable Nearest Neighbor Search in P2P Systems , 2004, DBISP2P.

[10]  Michel Barlaud,et al.  Fast k nearest neighbor search using GPU , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[11]  John Riedl,et al.  Towards a Scalable k NN CF Algorithm: Exploring Effective Applications of Clustering , 2006, WEBKDD.

[12]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[13]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[14]  Christopher K. I. Williams,et al.  Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning) , 2005 .

[15]  Gwilym M. Jenkins,et al.  Time series analysis, forecasting and control , 1972 .

[16]  Jon Louis Bentley,et al.  An Algorithm for Finding Best Matches in Logarithmic Expected Time , 1977, TOMS.

[17]  Claudia Eckert,et al.  Lazy Gaussian Process Committee for Real-Time Online Regression , 2013, AAAI.