Topology-aware Parallel Data Processing: Models, Algorithms and Systems at Scale

The analysis of massive datasets requires a large number of processors. Prior research has largely assumed that tracking the actual data distribution and the underlying network structure of a cluster, which we collectively refer to as the topology, comes with a high cost and has little practical benefit. As a result, theoretical models, algorithms and systems often assume a uniform topology; however this assumption rarely holds in practice. This necessitates an end-to-end investigation of how one can model, design and deploy topology-aware algorithms for fundamental data processing tasks at large scale. To achieve this goal, we first develop a theoretical parallel model that can jointly capture the cost of computation and communication. Using this model, we explore algorithms with theoretical guarantees for three basic tasks: aggregation, join, and sorting. Finally, we consider the practical aspects of implementing topology-aware algorithms at scale, and show that they have the potential to be orders of magnitude faster than their topology-oblivious counterparts.

[1]  Leslie G. Valiant,et al.  A bridging model for parallel computation , 1990, CACM.

[2]  Jesper Larsson Träff Implementing the MPI process topology mechanism , 2002, SC '02.

[3]  Dan Suciu,et al.  A Guide to Formal Analysis of Join Processing in Massively Parallel Systems , 2017, SGMD.

[4]  Yair Bartal,et al.  Probabilistic approximation of metric spaces and its algorithmic applications , 1996, Proceedings of 37th Conference on Foundations of Computer Science.

[5]  Patrick Th. Eugster,et al.  Optimal communication structures for big data aggregation , 2015, 2015 IEEE Conference on Computer Communications (INFOCOM).

[6]  Yufei Tao,et al.  Output-optimal Parallel Algorithms for Similarity Joins , 2017, PODS.

[7]  Jeffrey F. Naughton,et al.  Adaptive parallel aggregation algorithms , 1995, SIGMOD '95.

[8]  Dan Suciu,et al.  Worst-Case Optimal Algorithms for Parallel Query Processing , 2016, ICDT.

[9]  Alfons Kemper,et al.  Locality-sensitive operators for parallel main-memory database clusters , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[10]  Philip S. Yu,et al.  A Parallel Hash Join Algorithm for Managing Data Skew , 1993, IEEE Trans. Parallel Distributed Syst..

[11]  Atri Rudra,et al.  Skew strikes back: new developments in the theory of join algorithms , 2013, SGMD.

[12]  Ramesh Subramonian,et al.  LogP: towards a realistic model of parallel computation , 1993, PPOPP '93.

[13]  Jonathan Schaeffer,et al.  Parallel Sorting by Regular Sampling , 1992, J. Parallel Distributed Comput..

[14]  Torsten Hoefler,et al.  Generic topology mapping strategies for large-scale parallel architectures , 2011, ICS '11.

[15]  Nikhil Bansal,et al.  A logarithmic approximation for unsplittable flow on line graphs , 2014, TALG.

[16]  Paul D. Seymour,et al.  Graph Minors. XX. Wagner's conjecture , 2004, J. Comb. Theory B.

[17]  David J. DeWitt,et al.  Practical Skew Handling in Parallel Joins , 1992, VLDB.

[18]  Alfons Kemper,et al.  Flow-Join: Adaptive skew handling for distributed joins over high-speed networks , 2016, 2016 IEEE 32nd International Conference on Data Engineering (ICDE).

[19]  Dan Suciu,et al.  A Worst-Case Optimal Multi-Round Algorithm for Parallel Computation of Conjunctive Queries , 2017, PODS.

[20]  Harald Räcke,et al.  Minimizing Congestion in General Networks , 2002, FOCS.

[21]  Xinyan Deng,et al.  Submodularity of Distributed Join Computation , 2018, SIGMOD Conference.

[22]  Laxmikant V. Kalé,et al.  Avoiding hot-spots on two-level direct networks , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[23]  Carsten Binnig,et al.  The End of Slow Networks: It's Time for a Redesign , 2015, Proc. VLDB Endow..

[24]  Dan Suciu,et al.  Skew in parallel query processing , 2014, PODS.

[25]  Richard Cole,et al.  Parallel merge sort , 1988, 27th Annual Symposium on Foundations of Computer Science (sfcs 1986).

[26]  Anastasios Sidiropoulos,et al.  Chasing Similarity: Distribution-aware Aggregation Scheduling , 2018, Proc. VLDB Endow..

[27]  Feilong Liu,et al.  Design and Evaluation of an RDMA-aware Data Shuffling Operator for Parallel Database Systems , 2017, EuroSys.

[28]  Sanjeev Khanna,et al.  Edge-disjoint paths in Planar graphs with constant congestion , 2006, STOC '06.

[29]  Per-Åke Larson,et al.  Data reduction by partial preaggregation , 2002, Proceedings 18th International Conference on Data Engineering.

[30]  Michael T. Goodrich,et al.  Communication-Efficient Parallel Sorting , 1999, SIAM J. Comput..

[31]  Satish Rao,et al.  Shallow excluded minors and improved graph decompositions , 1994, SODA '94.

[32]  Wei Hong,et al.  Proceedings of the 5th Symposium on Operating Systems Design and Implementation Tag: a Tiny Aggregation Service for Ad-hoc Sensor Networks , 2022 .