Processing multi-way spatial joins on map-reduce

In this paper we investigate the problem of processing multi-way spatial joins on map-reduce platform. We look at two common spatial predicates - overlap and range. We address these two classes of join queries, discuss the challenges and outline novel approaches for executing these queries on a map-reduce framework. We then discuss how we can process join queries involving both overlap and range predicates. Specifically we present a Controlled-Replicate framework using which we design the approaches presented in this paper. The Controlled-Replicate framework is carefully engineered to minimize the communication among cluster nodes. Through experimental evaluations we discuss the complexity of the problem under investigation, details of Controlled-Replicate framework and demonstrate that the proposed approaches comfortably outperform naive approaches.

[1]  Bernhard Seeger,et al.  Data redundancy and duplicate detection in spatial join processing , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[2]  Chen Li,et al.  Efficient parallel set-similarity joins using MapReduce , 2010, SIGMOD Conference.

[3]  Dimitris Papadias,et al.  Integration of spatial join algorithms for processing multiple inputs , 1999, SIGMOD '99.

[4]  Oliver Günther Efficient Computation of Spatial Joins , 1993, ICDE.

[5]  David J. DeWitt,et al.  Partition based spatial-merge join , 1996, SIGMOD '96.

[6]  Mirek Riedewald,et al.  Processing theta-joins using MapReduce , 2011, SIGMOD '11.

[7]  Jignesh M. Patel,et al.  A comparison of join algorithms for log processing in MaPreduce , 2010, SIGMOD Conference.

[8]  Zhiyong Xu,et al.  SJMR: Parallelizing spatial join with MapReduce on clusters , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.

[9]  Beng Chin Ooi,et al.  Efficient Processing of k Nearest Neighbor Joins using MapReduce , 2012, Proc. VLDB Endow..

[10]  Hans-Peter Kriegel,et al.  Parallel processing of spatial joins using R-trees , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[11]  Hans-Peter Kriegel,et al.  Efficient processing of spatial joins using R-trees , 1993, SIGMOD Conference.

[12]  Ming-Ling Lo,et al.  Spatial joins using seeded trees , 1994, SIGMOD '94.

[13]  David J. DeWitt,et al.  Clone join and shadow join: two parallel spatial join algorithms , 2000, GIS '00.

[14]  Ming-Ling Lo,et al.  Spatial hash-joins , 1996, SIGMOD '96.

[15]  Jeffrey D. Ullman,et al.  Optimizing joins in a map-reduce environment , 2010, EDBT '10.

[16]  Xuan Song,et al.  Accelerating Spatial Data Processing with MapReduce , 2010, 2010 IEEE 16th International Conference on Parallel and Distributed Systems.

[17]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[18]  Dimitris Papadias,et al.  Approximate Processing of Multiway Spatial Joins in Very Large Databases , 2002, EDBT.

[19]  Kai Wang,et al.  Spatial Queries Evaluation with MapReduce , 2009, 2009 Eighth International Conference on Grid and Cooperative Computing.

[20]  Dimitris Papadias,et al.  Multiway spatial joins , 2001, ACM Trans. Database Syst..