Traffic information retrieval and data mining are not only the hotspots and key techniques in the intelligent transportation, but also the research issue of massive data’s distributed processing. With the development of urban traffic acquisition technology, the traffic data have increased to PB level. In order to manage these traffic data effectively and serve for intelligent transportation, we need to use efficient algorithm to process them in the distributed environment. In a distributed platform, this paper optimizes the Hadoop schedule algorithm that is used in processing traffic data and makes up the shortcomings of real-time traditional algorithms. The results of experiments show that the optimized scheduling algorithm used in a distributed environment, whether it is compute-intensive or I/O-intensive, has the most minimum calculation time, the best performance, better capacity of processing the traffic data, and better real time.
[1]
Yifeng Zhu,et al.
Design of Hadoop-based Framework for Analytics of Large Synchrophasor Datasets
,
2012,
Complex Adaptive Systems.
[2]
Xu Xiao-long.
Mass data processing system based on large-scale low-cost computing platform
,
2012
.
[3]
Michael J. Fischer,et al.
Assigning tasks for efficiency in Hadoop: extended abstract
,
2010,
SPAA '10.
[4]
Hengcai Zhang,et al.
Estimating Beijing's travel delays at intersections with floating car data
,
2012,
IWCTS '12.
[5]
Jiun-Long Huang,et al.
A load-aware scheduler for MapReduce framework in heterogeneous cloud environments
,
2011,
SAC '11.
[6]
Sanjay Ghemawat,et al.
MapReduce: Simplified Data Processing on Large Clusters
,
2004,
OSDI.