TrajSpark: A Scalable and Efficient In-Memory Management System for Big Trajectory Data

The widespread application of mobile positioning devices has generated big trajectory data. Existing disk-based trajectory management systems cannot provide scalable and low latency query services any more. In view of that, we present TrajSpark, a distributed in-memory system to consistently offer efficient management of trajectory data. TrajSpark introduces a new abstraction called IndexTRDD to manage trajectory segments, and exploits a global and local indexing mechanism to accelerate trajectory queries. Furthermore, to alleviate the essential partitioning overhead, it adopts the time-decay model to monitor the change of data distribution and updates the data-partition structure adaptively. This model avoids repartitioning existing data when new batch of data arrives. Extensive experiments of three types of trajectory queries on both real and synthetic dataset demonstrate that the performance of TrajSpark outperforms state-of-the-art systems.

[1]  Frank Dürr,et al.  Scalable processing of trajectory-based queries in space-partitioned moving objects databases , 2008, GIS '08.

[2]  Aoying Zhou,et al.  Popular Route Planning with Travel Cost Estimation , 2016, DASFAA.

[3]  Minyi Guo,et al.  Simba: Efficient In-Memory Spatial Analytics , 2016, SIGMOD Conference.

[4]  Xiaoyong Du,et al.  Elite: an elastic infrastructure for big spatiotemporal trajectories , 2016, The VLDB Journal.

[5]  Christopher N. Eichelberger,et al.  GeoMesa: a distributed architecture for spatio-temporal fusion , 2015, Defense + Security Symposium.

[6]  I. Vajda,et al.  A new class of metric divergences on probability spaces and its applicability in statistics , 2003 .

[7]  Fei Wang,et al.  OceanST: A Distributed Analytic System for Large-Scale Spatiotemporal Mobile Broadband Data , 2014, Proc. VLDB Endow..

[8]  Samuel Madden,et al.  TrajStore: An adaptive storage system for very large trajectory data sets , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[9]  Le Gruenwald,et al.  Large-scale spatial join query processing in Cloud , 2015, 2015 31st IEEE International Conference on Data Engineering Workshops.

[10]  Jignesh M. Patel,et al.  Indexing Large Trajectory Data Sets With SETI , 2003, CIDR.

[11]  Divyakant Agrawal,et al.  $\mathcal{MD}$-HBase: design and implementation of an elastic data infrastructure for cloud-scale location services , 2012, Distributed and Parallel Databases.

[12]  Walid G. Aref,et al.  LocationSpark: A Distributed In-Memory Data Management System for Big Spatial Data , 2016, Proc. VLDB Endow..

[13]  Lionel M. Ni,et al.  CloST: a hadoop-based storage system for big spatio-temporal data analytics , 2012, CIKM '12.

[14]  Shazia Wasim Sadiq,et al.  SharkDB: An In-Memory Storage System for Massive Trajectory Data , 2015, SIGMOD Conference.

[15]  Ahmed Eldawy,et al.  SpatialHadoop: A MapReduce framework for spatial data , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[16]  Ge Yu,et al.  R-HBase: A Multi-dimensional Indexing Framework for Cloud Computing Environment , 2014, 2014 IEEE International Conference on Data Mining Workshop.

[17]  Aoying Zhou,et al.  Query processing of massive trajectory data based on mapreduce , 2009, CloudDB@CIKM.

[18]  Mohamed Sarwat,et al.  GeoSpark: a cluster computing framework for processing large-scale spatial data , 2015, SIGSPATIAL/GIS.

[19]  Jörg Sander,et al.  PIST: An Efficient and Practical Indexing Technique for Historical Spatio-Temporal Point Data , 2008, GeoInformatica.