Cost-efficient partitioning of spatial data on cloud

With the rise of mobile technologies (e.g., smart phones, wearable technologies) and location-aware Internet browsers, a massive amount of spatial data is being collected since such tools allow users to geo-tag user content (e.g., photos, tweets). Meanwhile, cloud computing providers such as Amazon and Microsoft allow users to lease computing resources where users are charged based on the amount of time they reserve each server, with no consideration of utilization. One key factor that affects server utilization is partitioning method especially in data-driven location-based services. Because if the data partitions are not accessed, the servers storing them remain idle but the user is still charged. Whereas, existing spatial data partitioning techniques aim to 1) cluster spatially close data objects to minimize disk I/O and 2) create equi-sized partitions. On the contrary, the objective is different for cloud given the current pricing models. In this paper, we propose a novel cost-efficient partitioning method for spatial data where an increase in the servers' utilizations yields less number of servers to support the same workload, thus saving cost. Extensive experiments on Amazon EC2 infrastructure demonstrate that our approach is efficient and reduces the cost by up to 40%.

[1]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[2]  Abhinandan Das,et al.  Automating layout of relational databases , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[3]  Shashi Shekhar,et al.  Optimizing join index based join processing: a graph partitioning approach , 1998, Proceedings Seventeenth IEEE Symposium on Reliable Distributed Systems (Cat. No.98CB36281).

[4]  Jon Louis Bentley,et al.  Quad trees a data structure for retrieval on composite keys , 1974, Acta Informatica.

[5]  Farnoush Banaei Kashani,et al.  Voronoi-Based Geospatial Query Processing with MapReduce , 2010, 2010 IEEE Second International Conference on Cloud Computing Technology and Science.

[6]  Carlo Curino,et al.  Schism , 2010, Proc. VLDB Endow..

[7]  Beng Chin Ooi,et al.  Indexing multi-dimensional data in a cloud system , 2010, SIGMOD Conference.

[8]  Jack A. Orenstein Multidimensional Tries Used for Associative Searching , 1982, Inf. Process. Lett..

[9]  Franziska Hoffmann,et al.  Spatial Tessellations Concepts And Applications Of Voronoi Diagrams , 2016 .

[10]  Richard M. Karp,et al.  An efficient approximation scheme for the one-dimensional bin-packing problem , 1982, 23rd Annual Symposium on Foundations of Computer Science (sfcs 1982).

[11]  Mark Handley,et al.  A scalable content-addressable network , 2001, SIGCOMM '01.

[12]  Nicolas Bruno,et al.  Automated partitioning design in parallel database systems , 2011, SIGMOD '11.

[13]  Ugur Demiryurek,et al.  ToSS-it: A Cloud-Based Throwaway Spatial Index Structure for Dynamic Location Data , 2014, 2014 IEEE 15th International Conference on Mobile Data Management.

[14]  S. Shekhar,et al.  Optimizing Join Index Based Spatial-Join Processing: A Graph Partitioning Approach , 1998 .