A cloud‐based taxi trace mining framework for smart city

As a well‐known field of big data applications, smart city takes advantage of massive data analysis to achieve efficient management and sustainable development in the current worldwide urbanization process. An important problem in smart city is how to discover frequent trajectory sequence pattern and cluster trajectory. To solve this problem, this paper proposes a cloud‐based taxi trajectory pattern mining and trajectory clustering framework for smart city. Our work mainly includes (1) preprocessing raw Global Positioning System trace by calling the Baidu API Geocoding; (2) proposing a distributed trajectory pattern mining (DTPM) algorithm based on Spark; and (3) proposing a distributed trajectory clustering (DTC) algorithm based on Spark. The proposed DTPM algorithm and DTC algorithm can overcome the high input/output overhead and communication overhead by adopting in‐memory computation. In addition, the proposed DTPM algorithm can avoid generating redundant local trajectory patterns to significantly improve the overall performance. The proposed DTC algorithm can enhance the performance of trajectory similarity computation by transforming the trajectory similarity calculation into AND and OR operators. Experimental results indicate that DTPM algorithm and DTC algorithm can significantly improve the overall performance and scalability of trajectory pattern mining and trajectory clustering on massive taxi trace data. Copyright © 2016 John Wiley & Sons, Ltd.

[1]  Ying Wah Teh,et al.  Iterative big data clustering algorithms: a review , 2016, Softw. Pract. Exp..

[2]  Ramachandran Ramjee,et al.  Nericell: rich monitoring of road and traffic conditions using mobile smartphones , 2008, SenSys '08.

[3]  Daqing Zhang,et al.  Measuring social functions of city regions from large-scale taxi behaviors , 2011, 2011 IEEE International Conference on Pervasive Computing and Communications Workshops (PERCOM Workshops).

[4]  Yu Zheng,et al.  Constructing popular routes from uncertain trajectories , 2012, KDD.

[5]  Chuan Xiao,et al.  A density-based approach for mining movement patterns from semantic trajectories , 2015, TENCON 2015 - 2015 IEEE Region 10 Conference.

[6]  Daqing Zhang,et al.  Urban Traffic Modelling and Prediction Using Large Scale Taxi GPS Traces , 2012, Pervasive.

[7]  Albert Y. Zomaya,et al.  Parallel Simulation of Complex Evacuation Scenarios with Adaptive Agent Models , 2015, IEEE Transactions on Parallel and Distributed Systems.

[8]  Rajiv Ranjan,et al.  IK-SVD: Dictionary Learning for Spatial Big Data via Incremental Atom Update , 2014, Computing in Science & Engineering.

[9]  Jae-Gil Lee,et al.  TraClass: trajectory classification using hierarchical region-based and trajectory-based clustering , 2008, Proc. VLDB Endow..

[10]  Seung-won Hwang,et al.  NNCluster: An Efficient Clustering Algorithm for Road Network Trajectories , 2010, DASFAA.

[11]  Mikolaj Morzy,et al.  Mining Frequent Trajectories of Moving Objects for Location Prediction , 2007, MLDM.

[12]  Qing He,et al.  Parallel K-Means Clustering Based on MapReduce , 2009, CloudCom.

[13]  Sami Faïz,et al.  Clustering Algorithm for Network Constraint Trajectories , 2008, SDH.

[14]  Albert Y. Zomaya,et al.  Particle Swarm Optimization based dictionary learning for remote sensing big data , 2015, Knowl. Based Syst..

[15]  Rajiv Ranjan,et al.  G-Hadoop: MapReduce across distributed data centers for data-intensive computing , 2013, Future Gener. Comput. Syst..

[16]  Shaojie Qiao,et al.  Parallel Sequential Pattern Mining of Massive Trajectory Data , 2010, Int. J. Comput. Intell. Syst..

[17]  Ling Liu,et al.  NEAT: Road Network Aware Trajectory Clustering , 2012, 2012 IEEE 32nd International Conference on Distributed Computing Systems.

[18]  Lizhe Wang,et al.  Hybrid modelling and simulation of huge crowd over a hierarchical Grid architecture , 2013, Future Gener. Comput. Syst..

[19]  Dino Pedreschi,et al.  Time-focused clustering of trajectories of moving objects , 2006, Journal of Intelligent Information Systems.

[20]  Xiao Liu,et al.  A Highly Practical Approach toward Achieving Minimum Data Sets Storage Cost in the Cloud , 2013, IEEE Transactions on Parallel and Distributed Systems.

[21]  Jae-Gil Lee,et al.  Incremental Clustering for Trajectories , 2010, DASFAA.

[22]  Ling Liu,et al.  Road-Network Aware Trajectory Clustering: Integrating Locality, Flow, and Density , 2015, IEEE Transactions on Mobile Computing.

[23]  Shaojie Qiao,et al.  TraPlan: An Effective Three-in-One Trajectory-Prediction Model in Transportation Networks , 2015, IEEE Transactions on Intelligent Transportation Systems.

[24]  Dino Pedreschi,et al.  Trajectory pattern mining , 2007, KDD '07.

[25]  Anjan K. Koundinya,et al.  MapReduce Design of K-Means Clustering Algorithm , 2013, 2013 International Conference on Information Science and Applications (ICISA).

[26]  Daqing Zhang,et al.  From taxi GPS traces to social and community dynamics , 2013, ACM Comput. Surv..

[27]  Lizhe Wang,et al.  Fast and Scalable Multi-Way Analysis of Massive Neural Data , 2015, IEEE Transactions on Computers.

[28]  Peng Liu,et al.  Link the remote sensing big data to the image features via wavelet transformation , 2016, Cluster Computing.

[29]  Sang-Wook Kim,et al.  Trajectory clustering in road network environment , 2009, 2009 IEEE Symposium on Computational Intelligence and Data Mining.

[30]  Meng Hu,et al.  TrajPattern: Mining Sequential Patterns from Imprecise Trajectories of Mobile Objects , 2006, EDBT.

[31]  Jae-Gil Lee,et al.  Trajectory clustering: a partition-and-group framework , 2007, SIGMOD '07.

[32]  Heng Tao Shen,et al.  Discovering popular routes from trajectories , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[33]  Albert-László Barabási,et al.  Limits of Predictability in Human Mobility , 2010, Science.

[34]  Christian S. Jensen,et al.  Vehicle Routing with User-Generated Trajectory Data , 2015, 2015 16th IEEE International Conference on Mobile Data Management.

[35]  Xing Xie,et al.  T-drive: driving directions based on taxi trajectories , 2010, GIS '10.

[36]  Zhaohui Wu,et al.  Prediction of urban human mobility using large-scale taxi traces and its applications , 2012, Frontiers of Computer Science.

[37]  Lei Chen,et al.  Finding time period-based most frequent path in big trajectory data , 2013, SIGMOD '13.

[38]  Dominique Genoud,et al.  Big data for smart cities with KNIME a real experience in the SmartSantander testbed , 2015, Softw. Pract. Exp..

[39]  Xing Xie,et al.  Urban computing with taxicabs , 2011, UbiComp '11.

[40]  Lidan Shou,et al.  Splitter: Mining Fine-Grained Sequential Patterns in Semantic Trajectories , 2014, Proc. VLDB Endow..

[41]  Ramakrishnan Srikant,et al.  Mining Sequential Patterns: Generalizations and Performance Improvements , 1996, EDBT.