GeoMatch: Efficient Large-Scale Map Matching on Apache Spark

We contribute by developing GeoMatch as a novel, scalable, and efficient big-data pipeline for large-scale map matching on Apache Spark. GeoMatch improves existing spatial big data solutions by utilizing a novel spatial partitioning scheme inspired by Hilbert space-filling curves. Thanks to the partitioning scheme, GeoMatch can effectively balance operations across different processing units and achieve significant performance gains. We demonstrate the effectiveness of GeoMatch through rigorous and extensive benchmarks that consider data sets containing large-scale urban spatial data sets ranging from 166, 253 to 3.78 billion location measurements. Our results show over 17-fold performance improvements compared to previous works while achieving better processing accuracy than current solutions (97.48%).

[1]  Xiaofang Zhou,et al.  A framework for parallel map-matching at scale using Spark , 2018, Distributed and Parallel Databases.

[2]  Andreas Kipf,et al.  How Good Are Modern Spatial Analytics Systems? , 2018, Proc. VLDB Endow..

[3]  Yu Zheng,et al.  CloudTP: A Cloud-Based Flexible Trajectory Preprocessing Framework , 2018, 2018 IEEE 34th International Conference on Data Engineering (ICDE).

[4]  Kai-Uwe Sattler,et al.  The STARK Framework for Spatio-Temporal Data Analytics on Spark , 2017, BTW.

[5]  Javam C. Machado,et al.  DMM: A distributed map-matching algorithm using the MapReduce paradigm , 2016, 2016 IEEE 19th International Conference on Intelligent Transportation Systems (ITSC).

[6]  Yu Zheng,et al.  Managing massive trajectories on the cloud , 2016, SIGSPATIAL/GIS.

[7]  Walid G. Aref,et al.  LocationSpark: A Distributed In-Memory Data Management System for Big Spatial Data , 2016, Proc. VLDB Endow..

[8]  Minyi Guo,et al.  Simba: Efficient In-Memory Spatial Analytics , 2016, SIGMOD Conference.

[9]  Ahmed Eldawy,et al.  The Era of Big Spatial Data: A Survey , 2015, Found. Trends Databases.

[10]  Mohamed Sarwat,et al.  GeoSpark: a cluster computing framework for processing large-scale spatial data , 2015, SIGSPATIAL/GIS.

[11]  Le Gruenwald,et al.  Large-scale spatial join query processing in Cloud , 2015, 2015 31st IEEE International Conference on Data Engineering Workshops.

[12]  Fusheng Wang,et al.  SATO: a spatial data partitioning framework for scalable query processing , 2014, SIGSPATIAL/GIS.

[13]  Guannan Liu,et al.  A cost-effective recommender system for taxi drivers , 2014, KDD.

[14]  Ahmed Eldawy,et al.  SpatialHadoop: towards flexible and scalable spatial processing using mapreduce , 2014, SIGMOD'14 PhD Symposium.

[15]  Haoyu Tan,et al.  MR-DBSCAN: a scalable MapReduce-based DBSCAN algorithm for heavily skewed data , 2013, Frontiers of Computer Science.

[16]  João Gama,et al.  Predicting Taxi–Passenger Demand Using Streaming Data , 2013, IEEE Transactions on Intelligent Transportation Systems.

[17]  Joel H. Saltz,et al.  Hadoop-GIS: A High Performance Spatial Data Warehousing System over MapReduce , 2013, Proc. VLDB Endow..

[18]  何耀彬,et al.  MR-DBSCAN: a scalable MapReduce-based DBSCAN algorithm for heavily skewed data , 2013 .

[19]  Yan Huang,et al.  Detecting regions of disequilibrium in taxi services under uncertainty , 2012, SIGSPATIAL/GIS.

[20]  Xing Xie,et al.  Urban computing with taxicabs , 2011, UbiComp '11.

[21]  Zhi-Hua Zhou,et al.  iBAT: detecting anomalous taxi trajectories from GPS traces , 2011, UbiComp '11.

[22]  Lin Sun,et al.  Hunting or waiting? Discovering passenger-finding strategies from a large-scale real-world taxi dataset , 2011, 2011 IEEE International Conference on Pervasive Computing and Communications Workshops (PERCOM Workshops).

[23]  Daniel J. Graham,et al.  Development of Key Performance Indicator to Compare Regularity of Service between Urban Bus Operators , 2011 .

[24]  Xing Xie,et al.  T-drive: driving directions based on taxi trajectories , 2010, GIS '10.

[25]  Xing Xie,et al.  An Interactive-Voting Based Map Matching Algorithm , 2010, 2010 Eleventh International Conference on Mobile Data Management.

[26]  Chengyang Zhang,et al.  Map-matching for low-sampling-rate GPS trajectories , 2009, GIS.

[27]  Yasushi Miyashita 表紙の顔:伊藤 正男 , 2009 .

[28]  Joel H. Saltz,et al.  Processing large-scale multi-dimensional data in parallel and distributed environments , 2002, Parallel Comput..

[29]  Mario A. López,et al.  STR: a simple and efficient algorithm for R-tree packing , 1997, Proceedings 13th International Conference on Data Engineering.

[30]  Christos Faloutsos,et al.  Hilbert R-tree: An Improved R-tree using Fractals , 1994, VLDB.

[31]  Mark D. Abkowitz TRANSIT SERVICE RELIABILITY , 1978 .

[32]  W. J. Bouknight An Improved Procedure for Generation of Half-Tone Computer Graphics Presentations , 1969 .

[33]  Jack Bresenham,et al.  Algorithm for computer control of a digital plotter , 1965, IBM Syst. J..