A framework for parallel map-matching at scale using Spark

Map-matching is a problem of matching recorded GPS trajectories to a digital representation of the road network. GPS data may be inaccurate and heterogeneous, due to limitations or error on electronic sensors, as well as law restrictions. How to accurately match trajectories to the road map is an important preprocessing step for many real-world applications, such as trajectory data mining, traffic analysis, and routes prediction. However, the high availability of GPS trajectories and map data challenges the scalability of current map-matching algorithms, which are limited for small datasets since they focus only on the accuracy of the matching rather than scalability. Therefore, we propose a distributed parallel framework for efficient and scalable offline map-matching on top of the Spark framework. Spark uses distributed in-memory data storage and the MapReduce paradigm to achieve horizontal scaling and fast computation of large datasets. Spark, however, is still limited for dynamic map-matching, and memory consumption in Spark can be an issue for very large datasets. We develop a framework to allow map-matching on top os Spark, while achieving horizontal scalability, memory-wise usage, and maintaining the accuracy of state-of-the-art matching algorithms by: (1) We combine a sampling-based Quadtree spatial partitioning construction and batch-based computation to achieve horizontal scalability of map-matching, as well as reduce cluster memory usage. (2) We employ a safe spatial-boundary approach to preserve matching accuracy of boundary objects. (3) In addition, a cost function for the distributed map-matching workload is provided in order to tune the framework parameters. Our extensive experiments demonstrate that our framework is efficient and scalable to process map-matching on large-scale data, while keeping matching accuracy and low memory usage.

[1]  Dieter Pfoser,et al.  Addressing the Need for Map-Matching Speed: Localizing Global Curve-Matching Algorithms , 2006, 18th International Conference on Scientific and Statistical Database Management (SSDBM'06).

[2]  Xiaokui Xiao,et al.  An efficient algorithm for mapping vehicle trajectories onto road networks , 2012, SIGSPATIAL/GIS.

[3]  Jong-Hwan Kim,et al.  Adaptive fuzzy-network-based C-measure map-matching algorithm for car navigation system , 2001, IEEE Trans. Ind. Electron..

[4]  John Krumm,et al.  Hidden Markov map matching through noise and sparseness , 2009, GIS.

[5]  Dieter Pfoser,et al.  On Map-Matching Vehicle Tracking Data , 2005, VLDB.

[6]  Joel H. Saltz,et al.  SparkGIS: Efficient Comparison and Evaluation of Algorithm Results in Tissue Image Analysis Studies , 2015, Big-O/DMAH@VLDB.

[7]  Le Gruenwald,et al.  Large-scale spatial join query processing in Cloud , 2015, 2015 31st IEEE International Conference on Data Engineering Workshops.

[8]  Wei Wu,et al.  Quadtree-based domain decomposition for parallel map-matching on GPS data , 2012, 2012 15th International IEEE Conference on Intelligent Transportation Systems.

[9]  H. Gintis Clash of the Titans , 2012 .

[10]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[11]  Yu-Ling Hsueh,et al.  Map matching for low-sampling-rate GPS trajectories by exploring real-time moving directions , 2018, Inf. Sci..

[12]  Ahmed Eldawy,et al.  SpatialHadoop: A MapReduce framework for spatial data , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[13]  Jian Huang,et al.  Research on parallelized real-time map matching algorithm for massive GPS data , 2017, Cluster Computing.

[14]  Jian Huang,et al.  Parallel Map Matching on Massive Vehicle GPS Data Using MapReduce , 2013, 2013 IEEE 10th International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing.

[15]  Michael J. Franklin,et al.  Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[16]  Leonidas J. Guibas,et al.  Large-scale joint map matching of GPS traces , 2013, SIGSPATIAL/GIS.

[17]  Adel Javanmard,et al.  Multi-track map matching , 2012, SIGSPATIAL/GIS.

[18]  Arti Arya,et al.  Framework for Horizontal Scaling of Map Matching: Using Map-Reduce , 2014, 2014 International Conference on Information Technology.

[19]  Oliver Pink,et al.  A statistical approach to map matching using road network geometry, topology and vehicular motion constraints , 2008, 2008 11th International IEEE Conference on Intelligent Transportation Systems.

[20]  Joel H. Saltz,et al.  Hadoop-GIS: A High Performance Spatial Data Warehousing System over MapReduce , 2013, Proc. VLDB Endow..

[21]  Eunmi Choi,et al.  A GPS Trajectory Map-Matching Mechanism with DTG Big Data on the HBase System , 2015, BigDAS.

[22]  Fei Wang,et al.  OceanST: A Distributed Analytic System for Large-Scale Spatiotemporal Mobile Broadband Data , 2014, Proc. VLDB Endow..

[23]  S.S. Chawathe,et al.  Segment-Based Map Matching , 2007, 2007 IEEE Intelligent Vehicles Symposium.

[24]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[25]  Chengyang Zhang,et al.  Map-matching for low-sampling-rate GPS trajectories , 2009, GIS.

[26]  Mohamed Sarwat,et al.  GeoSpark: a cluster computing framework for processing large-scale spatial data , 2015, SIGSPATIAL/GIS.

[27]  Yin Wang,et al.  Fast Viterbi map matching with tunable weight functions , 2012, SIGSPATIAL/GIS.

[28]  Minyi Guo,et al.  Simba: Efficient In-Memory Spatial Analytics , 2016, SIGMOD Conference.

[29]  Muhammad Tayyab Asif,et al.  Online map-matching based on Hidden Markov model for real-time traffic sensing applications , 2012, 2012 15th International IEEE Conference on Intelligent Transportation Systems.

[30]  Heng Tao Shen,et al.  IF-Matching: Towards Accurate Map-Matching with Information Fusion , 2017, IEEE Transactions on Knowledge and Data Engineering.

[31]  Günter Rote,et al.  Matching planar maps , 2003, SODA '03.

[32]  Chen Wang,et al.  Clash of the Titans: MapReduce vs. Spark for Large Scale Data Analytics , 2015, Proc. VLDB Endow..

[33]  Xing Xie,et al.  Reducing Uncertainty of Low-Sampling-Rate Trajectories , 2012, 2012 IEEE 28th International Conference on Data Engineering.