论文信息 - Large-scale spatial join query processing in Cloud

Large-scale spatial join query processing in Cloud

The rapidly increasing amount of location data available in many applications has made it desirable to process their large-scale spatial queries in Cloud for performance and scalability. We report our designs and implementations of two prototype systems that are ready for Cloud deployments: SpatialSpark based on Apache Spark and ISP-MC based on Cloudera Impala. Both systems support indexed spatial joins based on point-in-polygon test and point-to-polyline distance computation. Experiments on the pickup locations of ~170 million taxi trips in New York City and ~10 million global species occurrences records have demonstrated both efficiency and scalability using Amazon EC2 clusters.

[1] Nong Li,et al. Runtime Code Generation in Cloudera Impala , 2014, IEEE Data Eng. Bull..

[2] Yin Yang,et al. OceanRT: real-time analytics over large temporal data , 2014, SIGMOD Conference.

[3] Ahmed Eldawy,et al. A Demonstration of SpatialHadoop: An Efficient MapReduce Framework for Spatial Data , 2013, Proc. VLDB Endow..

[4] Hanan Samet,et al. Spatial join techniques , 2007, TODS.

[5] Gang Chen,et al. Efficient In-memory Data Management: An Analysis , 2014, Proc. VLDB Endow..

[6] Jianting Zhang,et al. Speeding up large-scale point-in-polygon test based spatial join on GPUs , 2012, BigSpatial '12.

[7] Sanjay Ghemawat,et al. MapReduce: a flexible data processing tool , 2010, CACM.

[8] Antony I. T. Rowstron,et al. Scale-up vs scale-out for Hadoop: time to rethink? , 2013, SoCC.

[9] David A. Patterson,et al. Computer Architecture - A Quantitative Approach, 5th Edition , 1996 .

[10] Xiaolan Xie,et al. On Massive Spatial Data Retrieval Based on Spark , 2014, WAIM Workshops.

[11] Joel H. Saltz,et al. Hadoop-GIS: A High Performance Spatial Data Warehousing System over MapReduce , 2013, Proc. VLDB Endow..

[12] Le Gruenwald,et al. Parallel online spatial and temporal aggregations on multi-core CPUs and many-core GPUs , 2014, Inf. Syst..

[13] Jimmy J. Lin,et al. Optimization Techniques for "Scaling Down" Hadoop on Multi-Core, Shared-Memory Systems , 2014, EDBT.

[14] Scott Shenker,et al. Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[15] David A. Patterson,et al. Computer Architecture: A Quantitative Approach , 1969 .

[16] R. Shackleton. A Quantitative Approach , 2005 .