Cruncher: Distributed in-memory processing for location-based services

Advances in location-based services (LBS) demand high-throughput processing of both static and streaming data. Recently, many systems have been introduced to support distributed main-memory processing to maximize the query throughput. However, these systems are not optimized for spatial data processing. In this demonstration, we showcase Cruncher, a distributed main-memory spatial data warehouse and streaming system. Cruncher extends Spark with adaptive query processing techniques for spatial data. Cruncher uses dynamic batch processing to distribute the queries and the data streams over commodity hardware according to an adaptive partitioning scheme. The batching technique also groups and orders the overlapping spatial queries to enable inter-query optimization. Both the data streams and the offline data share the same partitioning strategy that allows for data co-locality optimization. Furthermore, Cruncher uses an adaptive caching strategy to maintain the frequently-used location data in main memory. Cruncher maintains operational statistics to optimize query processing, data partitioning, and caching at runtime. We demonstrate two LBS applications over Cruncher using real datasets from OpenStreetMap and two synthetic data streams. We demonstrate that Cruncher achieves order(s) of magnitude throughput improvement over Spark when processing spatial data.

[1]  Walid G. Aref,et al.  Tornado: A Distributed Spatio-Textual Stream Processing System , 2015, Proc. VLDB Endow..

[2]  Kevin Curran,et al.  OpenStreetMap , 2012, Int. J. Interact. Commun. Syst. Technol..

[3]  Ahmed Eldawy,et al.  SpatialHadoop: A MapReduce framework for spatial data , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[4]  Divyakant Agrawal,et al.  MD-HBase: A Scalable Multi-dimensional Data Infrastructure for Location Aware Services , 2011, 2011 IEEE 12th International Conference on Mobile Data Management.

[5]  Walid G. Aref,et al.  Spatial queries with k-nearest-neighbor and relational predicates , 2015, SIGSPATIAL/GIS.

[6]  Walid G. Aref,et al.  Spatial Queries with Two kNN Predicates , 2012, Proc. VLDB Endow..

[7]  Ralf Hartmut Güting,et al.  Parallel SECONDO: A practical system for large-scale processing of moving objects , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[8]  Michael J. Franklin,et al.  Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[9]  Ralf Hartmut Güting,et al.  BerlinMOD: a benchmark for moving object databases , 2009, The VLDB Journal.

[10]  Walid G. Aref,et al.  A Demonstration of AQWA: Adaptive Query-Workload-Aware Partitioning of Big Spatial Data , 2015, Proc. VLDB Endow..

[11]  Joel H. Saltz,et al.  Hadoop-GIS: A High Performance Spatial Data Warehousing System over MapReduce , 2013, Proc. VLDB Endow..

[12]  Scott Shenker,et al.  Discretized streams: fault-tolerant streaming computation at scale , 2013, SOSP.