A Map Reduce-Based Nearest Neighbor Approach for Big-Data-Driven Traffic Flow Prediction

In big-data-driven traffic flow prediction systems, the robustness of prediction performance depends on accuracy and timeliness. This paper presents a new MapReduce-based nearest neighbor (NN) approach for traffic flow prediction using correlation analysis (TFPC) on a Hadoop platform. In particular, we develop a real-time prediction system including two key modules, i.e., offline distributed training (ODT) and online parallel prediction (OPP). Moreover, we build a parallel $k$ -nearest neighbor optimization classifier, which incorporates correlation information among traffic flows into the classification process. Finally, we propose a novel prediction calculation method, combining the current data observed in OPP and the classification results obtained from large-scale historical data in ODT, to generate traffic flow prediction in real time. The empirical study on real-world traffic flow big data using the leave-one-out cross validation method shows that TFPC significantly outperforms four state-of-the-art prediction approaches, i.e., autoregressive integrated moving average, Naïve Bayes, multilayer perceptron neural networks, and NN regression, in terms of accuracy, which can be improved 90.07% in the best case, with an average mean absolute percent error of 5.53%. In addition, it displays excellent speedup, scaleup, and sizeup.

[1]  Tom White,et al.  Hadoop - The Definitive Guide: Storage and Analysis at Internet Scale (4. ed., revised & updated) , 2012 .

[2]  Lei Shu,et al.  IEEE Access Special Session Editorial: Big Data Services and Computational Intelligence for Industrial Systems , 2015, IEEE Access.

[3]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[4]  Fei-Yue Wang,et al.  Traffic Flow Prediction With Big Data: A Deep Learning Approach , 2015, IEEE Transactions on Intelligent Transportation Systems.

[5]  Luming Zhang,et al.  Special issue on big data driven Intelligent Transportation Systems , 2016, Neurocomputing.

[6]  Haitham Al-Deek,et al.  Predictions of Freeway Traffic Speeds and Volumes Using Vector Autoregressive Models , 2009, J. Intell. Transp. Syst..

[7]  Yanru Zhang,et al.  A hybrid short-term traffic flow forecasting method based on spectral analysis and statistical volatility model , 2014 .

[8]  Christoforos E. Kozyrakis,et al.  Evaluating MapReduce for Multi-core and Multiprocessor Systems , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[9]  Xue-wen Chen,et al.  Big Data Deep Learning: Challenges and Perspectives , 2014, IEEE Access.

[10]  Li Li,et al.  Robust causal dependence mining in big data network and its application to traffic flow predictions , 2015 .

[11]  Tom White,et al.  Hadoop: The Definitive Guide , 2009 .

[12]  Hui Wang,et al.  Nearest neighbors by neighborhood counting , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Yonggang Wen,et al.  Toward Scalable Systems for Big Data Analytics: A Technology Tutorial , 2014, IEEE Access.

[14]  Billy M. Williams,et al.  Comparison of parametric and nonparametric models for traffic flow forecasting , 2002 .

[15]  Biswajit Basu,et al.  Real-Time Traffic Flow Forecasting Using Spectral Analysis , 2012, IEEE Transactions on Intelligent Transportation Systems.

[16]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[17]  Zili Zhang,et al.  A distributed spatial-temporal weighted model on MapReduce for short-term traffic flow forecasting , 2016, Neurocomputing.

[18]  Daniel T. Larose,et al.  Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .

[19]  Qi Shi,et al.  Big Data applications in real-time traffic operation and safety monitoring and improvement on urban expressways , 2015 .

[20]  Wanli Min,et al.  Real-time road traffic prediction with spatio-temporal correlations , 2011 .

[21]  Man-Chun Tan,et al.  An Aggregation Approach to Short-Term Traffic Flow Prediction , 2009, IEEE Transactions on Intelligent Transportation Systems.

[22]  Eleni I. Vlahogianni,et al.  Statistical methods versus neural networks in transportation research: Differences, similarities and some insights , 2011 .

[23]  Yanbo Han,et al.  A Hybrid Processing System for Large-Scale Traffic Sensor Data , 2015, IEEE Access.

[24]  Stefano Panzieri,et al.  Urban traffic flow forecasting through statistical and neural network bagging ensemble hybrid modeling , 2015, Neurocomputing.

[25]  Fei-Yue Wang,et al.  Data-Driven Intelligent Transportation Systems: A Survey , 2011, IEEE Transactions on Intelligent Transportation Systems.

[26]  Hans-Peter Kriegel,et al.  A Fast Parallel Clustering Algorithm for Large Spatial Databases , 1999, Data Mining and Knowledge Discovery.

[27]  Paul Zikopoulos,et al.  Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data , 2011 .

[28]  Jun Zhang,et al.  Network Traffic Classification Using Correlation Information , 2013, IEEE Transactions on Parallel and Distributed Systems.

[29]  Amir F. Atiya,et al.  A Novel Template Reduction Approach for the $K$-Nearest Neighbor Method , 2009, IEEE Transactions on Neural Networks.

[30]  Xindong Wu,et al.  Data mining with big data , 2014, IEEE Transactions on Knowledge and Data Engineering.

[31]  Eleni I. Vlahogianni,et al.  Big data in transportation and traffic engineering , 2015 .

[32]  Alessandro De Gloria,et al.  Time-Aware Multivariate Nearest Neighbor Regression Methods for Traffic Flow Prediction , 2015, IEEE Transactions on Intelligent Transportation Systems.

[33]  Eleni I. Vlahogianni,et al.  Short‐term traffic forecasting: Overview of objectives and methods , 2004 .

[34]  Tharam S. Dillon,et al.  Neural-Network-Based Models for Short-Term Traffic Flow Forecasting Using a Hybrid Exponential Smoothing and Levenberg–Marquardt Algorithm , 2012, IEEE Transactions on Intelligent Transportation Systems.

[35]  Zuduo Zheng,et al.  Short-term traffic volume forecasting : a k-nearest neighbor approach enhanced by constrained linearly sewing principle component algorithm , 2014 .

[36]  Yin Wang,et al.  The retrieval of intra-day trend and its influence on traffic prediction , 2012 .

[37]  Hesham Rakha,et al.  Real-time travel time prediction using particle filtering with a non-explicit state-transition model , 2014 .

[38]  Dirk Helbing,et al.  Empirical Features of Congested Traffic States and Their Implications for Traffic Modeling , 2007, Transp. Sci..

[39]  Mu-Chen Chen,et al.  A data mining based approach for travel time prediction in freeway with non-recurrent congestion , 2014, Neurocomputing.

[40]  Yunlong Zhang,et al.  Special issue on short-term traffic flow forecasting , 2014 .

[41]  C. Lewis Industrial and business forecasting methods : a practical guide to exponential smoothing and curve fitting , 1982 .

[42]  W. Y. Szeto,et al.  Short-Term Traffic Speed Forecasting Based on Data Recorded at Irregular Intervals , 2010, IEEE Transactions on Intelligent Transportation Systems.