Load balancing and skew resilience for parallel joins

We address the problem of load balancing for parallel joins.We show that the distribution of input data received and the output data produced by worker machines are both important for performance. As a result, previous work, which optimizes either for input or output, stands ineffective for load balancing. To that end, we propose a multi-stage load-balancing algorithm which considers the properties of both input and output data through sampling of the original join matrix. To do this efficiently, we propose a novel category of equi-weight histograms. To build them, we exploit state-of-the-art computational geometry algorithms for rectangle tiling. To our knowledge, we are the first to employ tiling algorithms for join load-balancing. In addition, we propose a novel, join-specialized tiling algorithm that has drastically lower time and space complexity than existing algorithms. Experiments show that our scheme outperforms state-of-the-art techniques by up to a factor of 15.

[1]  Christoph Koch,et al.  Scalable and Adaptive Online Joins , 2014, Proc. VLDB Endow..

[2]  David J. DeWitt,et al.  Equi-depth multidimensional histograms , 1988, SIGMOD '88.

[3]  Dan Suciu,et al.  From Theory to Practice: Efficient Join Query Evaluation in a Parallel Database System , 2015, SIGMOD Conference.

[4]  Yannis E. Ioannidis,et al.  Selectivity Estimation Without the Attribute Value Independence Assumption , 1997, VLDB.

[5]  Kenneth A. Ross,et al.  Track join: distributed joins with minimal network traffic , 2014, SIGMOD Conference.

[6]  Nicolas Bruno,et al.  Advanced Join Strategies for Large-Scale Distributed Computation , 2014, Proc. VLDB Endow..

[7]  Rajeev Motwani,et al.  Random sampling for histogram construction: how much is enough? , 1998, SIGMOD '98.

[8]  Piotr Berman,et al.  Slice and dice: a simple, improved approximate tiling recipe , 2002, SODA '02.

[9]  Joseph M. Hellerstein,et al.  Flux: an adaptive partitioning operator for continuous query systems , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[10]  Peter J. Haas,et al.  Non-uniformity issues and workarounds in bounded-size sampling , 2013, The VLDB Journal.

[11]  Yossi Matias,et al.  Fast incremental maintenance of approximate histograms , 1997, TODS.

[12]  Torsten Suel,et al.  Approximation algorithms for array partitioning problems , 2005, J. Algorithms.

[13]  Philip S. Yu,et al.  An effective algorithm for parallelizing hash joins in the presence of data skew , 1991, [1991] Proceedings. Seventh International Conference on Data Engineering.

[14]  Scott Shenker,et al.  Shark: SQL and rich analytics at scale , 2012, SIGMOD '13.

[15]  Jignesh M. Patel,et al.  A comparison of join algorithms for log processing in MaPreduce , 2010, SIGMOD Conference.

[16]  Graham Cormode,et al.  Holistic aggregates in a networked world: distributed tracking of approximate quantiles , 2005, SIGMOD '05.

[17]  Christopher Olston,et al.  Automatic Optimization of Parallel Dataflow Programs , 2008, USENIX Annual Technical Conference.

[18]  Charles E. Leiserson,et al.  Executing task graphs using work-stealing , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[19]  Dan Suciu,et al.  Skew in parallel query processing , 2014, PODS.

[20]  Yusu Wang,et al.  Relations between Two Common Types of Rectangular Tilings , 2006, Int. J. Comput. Geom. Appl..

[21]  Andrey Gubarev,et al.  Dremel : Interactive Analysis of Web-Scale Datasets , 2011 .

[22]  Mirek Riedewald,et al.  Processing theta-joins using MapReduce , 2011, SIGMOD '11.

[23]  Junfeng Yang,et al.  Optimizing Data Partitioning for Data-Parallel Computing , 2011, HotOS.

[24]  Honesty C. Young,et al.  A Symmetric Fragment and Replicate Algorithm for Distributed Joins , 1993, IEEE Trans. Parallel Distributed Syst..

[25]  Liang Chen,et al.  Handling data skew in parallel joins in shared-nothing systems , 2008, SIGMOD Conference.

[26]  David J. DeWitt,et al.  Practical Skew Handling in Parallel Joins , 1992, VLDB.

[27]  Torsten Suel,et al.  On Rectangular Partitionings in Two Dimensions: Algorithms, Complexity, and Applications , 1999, ICDT.

[28]  Sudipto Guha,et al.  Dynamic multidimensional histograms , 2002, SIGMOD '02.

[29]  A. N. Wilschut,et al.  Dataflow query execution in a parallel main-memory environment , 1991, Distributed and Parallel Databases.

[30]  Yufei Tao,et al.  RPJ: producing fast join results on streams through rate-based optimization , 2005, SIGMOD '05.

[31]  Yannis E. Ioannidis,et al.  Estimation of Query-Result Distribution and its Application in Parallel-Join Load Balancing , 1996, VLDB.

[32]  Douglas Stott Parker,et al.  Map-reduce-merge: simplified relational data processing on large clusters , 2007, SIGMOD '07.

[33]  Bernhard Seeger,et al.  Progressive Merge Join: A Generic and Non-blocking Sort-based Join Algorithm , 2002, VLDB.

[34]  David J. Groggel,et al.  Nonparametric Methods for Quantitative Analysis , 1996, Technometrics.

[35]  Piotr Berman,et al.  Improved approximation algorithms for rectangle tiling and packing , 2001, SODA '01.

[36]  David J. DeWitt,et al.  An Evaluation of Non-Equijoin Algorithms , 1991, VLDB.

[37]  Christos Doulkeridis,et al.  A survey of large-scale analytical query processing in MapReduce , 2013, The VLDB Journal.

[38]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[39]  Magdalena Balazinska,et al.  SkewTune: mitigating skew in mapreduce applications , 2012, SIGMOD Conference.

[40]  Stavros Christodoulakis,et al.  On the propagation of errors in the size of join results , 1991, SIGMOD '91.

[41]  Min Wang,et al.  Efficient Multi-way Theta-Join Processing Using MapReduce , 2012, Proc. VLDB Endow..

[42]  Masaru Kitsuregawa,et al.  Bucket Spreading Parallel Hash: A New, Robust, Parallel Hash Join Method for Data Skew in the Super Database Computer (SDC) , 1990, VLDB.

[43]  Rajeev Motwani,et al.  On random sampling over joins , 1999, SIGMOD '99.

[44]  Alfred G. Dale,et al.  A Taxonomy and Performance Model of Data Skew Effects in Parallel Joins , 1991, VLDB.

[45]  Sridhar Ramaswamy,et al.  Join synopses for approximate query answering , 1999, SIGMOD '99.

[46]  Luis Gravano,et al.  STHoles: a multidimensional workload-aware histogram , 2001, SIGMOD '01.

[47]  M. Balazinska,et al.  A Study of Skew in MapReduce Applications , 2011 .

[48]  Volker Markl,et al.  LEO - DB2's LEarning Optimizer , 2001, VLDB.

[49]  Masaru Kitsuregawa,et al.  Dynamic Join Product Skew Handling for Hash-Joins in Shared-Nothing Database Systems , 1995, DASFAA.

[50]  Surajit Chaudhuri,et al.  Self-tuning histograms: building histograms without looking at data , 1999, SIGMOD '99.

[51]  Paul G. Spirakis,et al.  Weighted random sampling with a reservoir , 2006, Inf. Process. Lett..

[52]  Kien A. Hua,et al.  Handling Data Skew in Multiprocessor Database Computers Using Partition Tuning , 1991, VLDB.