A Group Mining Method for Big Data on Distributed Vehicle Trajectories in WAN

A distributed parallel clustering method MCR-ACA is proposed by integrating the ant colony algorithm with the computing framework Map-Combine-Reduce for mining groups with the same or similar features from big data on vehicle trajectories stored in Wide Area Network. The heaviest computing burden of clustering is conducted in parallel at local nodes, of which the results are merged to small size intermediates. The intermediates are sent to the central node and clusters are generated adaptively. The great overhead of transferring big volume data is avoided by MCR-ACA, which improves the computing efficiency and guarantees the correctness of clustering. MCR-ACA is compared with an existing parallel clustering algorithm on practical big data collected by the traffic monitoring system of Jiangsu province in China. Experimental results demonstrate that the proposed method is effective for group mining by clustering.

[1]  G. Theraulaz,et al.  Inspiration for optimization from social insect behaviour , 2000, Nature.

[2]  Younghoon Kim,et al.  DBCURE-MR: An efficient density-based clustering algorithm for large data using MapReduce , 2014, Inf. Syst..

[3]  Yan Yang,et al.  Parallel Implementation of Ant-Based Clustering Algorithm Based on Hadoop , 2012, ICSI.

[4]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[5]  Anirban Mukherjee,et al.  Shared disk big data analytics with Apache Hadoop , 2012, 2012 19th International Conference on High Performance Computing.

[6]  Marco Dorigo,et al.  Ant algorithms and stigmergy , 2000, Future Gener. Comput. Syst..

[7]  Wang Shan,et al.  Architecting Big Data:Challenges,Studies and Forecasts , 2011 .

[8]  Sanjay Ghemawat,et al.  MapReduce: a flexible data processing tool , 2010, CACM.

[9]  Nanfeng Xiao,et al.  Parallel Implementation of Dynamic Positive and Negative Feedback ACO with Iterative MapReduce Model , 2013 .

[10]  Carlo Zaniolo,et al.  Very fast estimation for result and accuracy of big data analytics: The EARL system , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[11]  Xuan Zhou,et al.  Architecting Big Data: Challenges, Studies and Forecasts: Architecting Big Data: Challenges, Studies and Forecasts , 2011 .

[12]  Ian T. Foster,et al.  Ophidia: Toward Big Data Analytics for eScience , 2013, ICCS.

[13]  Albert-László Barabási,et al.  Limits of Predictability in Human Mobility , 2010, Science.

[14]  Michael Isard,et al.  Distributed aggregation for data-parallel computing: interfaces and implementations , 2009, SOSP '09.

[15]  César A. Hidalgo,et al.  Unique in the Crowd: The privacy bounds of human mobility , 2013, Scientific Reports.

[16]  Weizhong Zhao,et al.  PSCAN: A Parallel Structural Clustering Algorithm for Big Networks in MapReduce , 2013, 2013 IEEE 27th International Conference on Advanced Information Networking and Applications (AINA).

[17]  Badrish Chandramouli,et al.  Temporal Analytics on Big Data for Web Advertising , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[18]  Patrick Th. Eugster,et al.  From the Cloud to the Atmosphere: Running MapReduce across Data Centers , 2014, IEEE Transactions on Computers.

[19]  Randy H. Katz,et al.  A view of cloud computing , 2010, CACM.

[20]  Bakul Panchal,et al.  A Comparative study of Clustering Algorithms using MapReduce in Hadoop , 2013 .