JUST: JD Urban Spatio-Temporal Data Engine

With the prevalence of positioning techniques, a prodigious number of spatio-temporal data is generated con-stantly. To effectively support sophisticated urban applications, e.g., location-based services, based on spatio-temporal data, it is desirable for an efficient, scalable, update-enabled, and easy-to-use spatio-temporal data management system.This paper presents JUST, i.e., JD Urban Spatio-Temporal data engine, which can efficiently manage big spatio-temporal data in a convenient way. JUST incorporates the distributed NoSQL data store, i.e., Apache HBase, as the underlying storage, GeoMesa as the spatio-temporal data indexing tool, and Apache Spark as the execution engine. We creatively design two indexing techniques, i.e., Z2T and XZ2T, which accelerates spatio-temporal queries tremendously. Furthermore, we introduce a compression mechanism, which not only greatly reduces the storage cost, but also improves the query efficiency. To make JUST easy-to-use, we design and implement a complete SQL engine, with which all operations can be performed through a SQL-like query language, i.e., JustQL. JUST also supports inherently new data insertions and historical data updates without index reconstruction. JUST is deployed as a PaaS in JD with multi-users support. Many applications have been developed based on the SDKs provided by JUST. Extensive experiments are carried out with six state-of-the-art distributed spatio-temporal data management systems based on two real datasets and one synthetic dataset. The results show that JUST has a competitive query performance and is much more scalable than them.

[1]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[2]  Cheng Long,et al.  Learning to Generate Maps from Trajectories , 2020, AAAI.

[3]  EldawyAhmed,et al.  Spatial partitioning techniques in SpatialHadoop , 2015, VLDB 2015.

[4]  Ahmed Eldawy,et al.  SpatialHadoop: A MapReduce framework for spatial data , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[5]  Erik G. Hoel,et al.  Spatial indexing and analytics on Hadoop , 2014, SIGSPATIAL/GIS.

[6]  Divyakant Agrawal,et al.  MD-HBase: A Scalable Multi-dimensional Data Infrastructure for Location Aware Services , 2011, 2011 IEEE 12th International Conference on Mobile Data Management.

[7]  H. Sagan Space-filling curves , 1994 .

[8]  Mohamed Sarwat,et al.  GeoSpark: a cluster computing framework for processing large-scale spatial data , 2015, SIGSPATIAL/GIS.

[9]  Yu Zheng,et al.  CloudTP: A Cloud-Based Flexible Trajectory Preprocessing Framework , 2018, 2018 IEEE 34th International Conference on Data Engineering (ICDE).

[10]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[11]  Christian Böhm,et al.  XZ-Ordering: A Space-Filling Curve for Objects with Spatial Extension , 1999, SSD.

[12]  Le Gruenwald,et al.  Large-scale spatial join query processing in Cloud , 2015, 2015 31st IEEE International Conference on Data Engineering Workshops.

[13]  Kai-Uwe Sattler,et al.  The STARK Framework for Spatio-Temporal Data Analytics on Spark , 2017, BTW.

[14]  Jie Bao,et al.  TrajMesa: A Distributed NoSQL Storage Engine for Big Trajectory Data , 2020, 2020 IEEE 36th International Conference on Data Engineering (ICDE).

[15]  Terence Parr,et al.  The Definitive ANTLR 4 Reference , 2013 .

[16]  Minyi Guo,et al.  Simba: Efficient In-Memory Spatial Analytics , 2016, SIGMOD Conference.

[17]  Han Chen,et al.  A hybrid index for multi-dimensional query in HBase , 2016, 2016 4th International Conference on Cloud Computing and Intelligence Systems (CCIS).

[18]  Joel H. Saltz,et al.  Hadoop-GIS: A High Performance Spatial Data Warehousing System over MapReduce , 2013, Proc. VLDB Endow..

[19]  T. H. Merrett,et al.  A class of data structures for associative searching , 1984, PODS.

[20]  James M. Kang,et al.  Space-Filling Curves , 2017, Encyclopedia of GIS.

[21]  Mohamed F. Mokbel,et al.  ST-Hadoop: a MapReduce framework for spatio-temporal data , 2017, GeoInformatica.

[22]  Tianrui Li,et al.  Predicting Citywide Crowd Flows Using Deep Spatio-Temporal Residual Networks , 2017, Artif. Intell..

[23]  Walid G. Aref,et al.  LocationSpark: A Distributed In-Memory Data Management System for Big Spatial Data , 2016, Proc. VLDB Endow..

[24]  Jia Yu,et al.  A demonstration of GeoSpark: A cluster computing framework for processing big spatial data , 2016, 2016 IEEE 32nd International Conference on Data Engineering (ICDE).

[25]  Joel H. Saltz,et al.  SparkGIS: Resource Aware Efficient In-Memory Spatial Query Processing , 2017, SIGSPATIAL/GIS.

[26]  Yu Zheng,et al.  A Cloud-Based Trajectory Data Management System , 2017, SIGSPATIAL/GIS.

[27]  이상훈,et al.  트위터 트랜딩 토픽을 이용한 HBase 기반 자동 요약 시스템 , 2014 .

[28]  Michael J. Franklin,et al.  Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[29]  Pete Wyckoff,et al.  Hive - A Warehousing Solution Over a Map-Reduce Framework , 2009, Proc. VLDB Endow..

[30]  Ming Zhao,et al.  Spatio-Temporal Data Index Model of Moving Objects on Fixed Networks Using HBase , 2015, 2015 IEEE International Conference on Computational Intelligence & Communication Technology.

[31]  Christopher N. Eichelberger,et al.  GeoMesa: a distributed architecture for spatio-temporal fusion , 2015, Defense + Security Symposium.

[32]  Wang-Chien Lee,et al.  Key Formulation Schemes for Spatial Index in Cloud Data Managements , 2012, 2012 IEEE 13th International Conference on Mobile Data Management.

[33]  Yu Zheng,et al.  Trajectory Data Mining , 2015, ACM Trans. Intell. Syst. Technol..

[34]  Ralf Hartmut Güting,et al.  BBoxDB - A Scalable Data Store for Multi-Dimensional Big Data , 2018, CIKM.

[35]  Prashant Malik,et al.  Cassandra: a decentralized structured storage system , 2010, OPSR.