The Optimization of Hadoop Scheduling Algorithms on Distributed System for Processing Traffic Information

Traffic information retrieval and data mining are not only the hotspots and key techniques in the intelligent transportation, but also the research issue of massive data’s distributed processing. With the development of urban traffic acquisition technology, the traffic data have increased to PB level. In order to manage these traffic data effectively and serve for intelligent transportation, we need to use efficient algorithm to process them in the distributed environment. In a distributed platform, this paper optimizes the Hadoop schedule algorithm that is used in processing traffic data and makes up the shortcomings of real-time traditional algorithms. The results of experiments show that the optimized scheduling algorithm used in a distributed environment, whether it is compute-intensive or I/O-intensive, has the most minimum calculation time, the best performance, better capacity of processing the traffic data, and better real time.