论文信息 - Topology-aware Parallel Data Processing: Models, Algorithms and Systems at Scale

Topology-aware Parallel Data Processing: Models, Algorithms and Systems at Scale

The analysis of massive datasets requires a large number of processors. Prior research has largely assumed that tracking the actual data distribution and the underlying network structure of a cluster, which we collectively refer to as the topology, comes with a high cost and has little practical benefit. As a result, theoretical models, algorithms and systems often assume a uniform topology; however this assumption rarely holds in practice. This necessitates an end-to-end investigation of how one can model, design and deploy topology-aware algorithms for fundamental data processing tasks at large scale. To achieve this goal, we first develop a theoretical parallel model that can jointly capture the cost of computation and communication. Using this model, we explore algorithms with theoretical guarantees for three basic tasks: aggregation, join, and sorting. Finally, we consider the practical aspects of implementing topology-aware algorithms at scale, and show that they have the potential to be orders of magnitude faster than their topology-oblivious counterparts.

Paraschos Koutris | Anastasios Sidiropoulos | Spyros Blanas

[1] Leslie G. Valiant,et al. A bridging model for parallel computation , 1990, CACM.

[2] Jesper Larsson Träff. Implementing the MPI process topology mechanism , 2002, SC '02.

[3] Dan Suciu,et al. A Guide to Formal Analysis of Join Processing in Massively Parallel Systems , 2017, SGMD.

[4] Yair Bartal,et al. Probabilistic approximation of metric spaces and its algorithmic applications , 1996, Proceedings of 37th Conference on Foundations of Computer Science.

[5] Patrick Th. Eugster,et al. Optimal communication structures for big data aggregation , 2015, 2015 IEEE Conference on Computer Communications (INFOCOM).

[6] Yufei Tao,et al. Output-optimal Parallel Algorithms for Similarity Joins , 2017, PODS.

[7] Jeffrey F. Naughton,et al. Adaptive parallel aggregation algorithms , 1995, SIGMOD '95.

[8] Dan Suciu,et al. Worst-Case Optimal Algorithms for Parallel Query Processing , 2016, ICDT.

[9] Alfons Kemper,et al. Locality-sensitive operators for parallel main-memory database clusters , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[10] Philip S. Yu,et al. A Parallel Hash Join Algorithm for Managing Data Skew , 1993, IEEE Trans. Parallel Distributed Syst..

[11] Atri Rudra,et al. Skew strikes back: new developments in the theory of join algorithms , 2013, SGMD.

[12] Ramesh Subramonian,et al. LogP: towards a realistic model of parallel computation , 1993, PPOPP '93.

[13] Jonathan Schaeffer,et al. Parallel Sorting by Regular Sampling , 1992, J. Parallel Distributed Comput..

[14] Torsten Hoefler,et al. Generic topology mapping strategies for large-scale parallel architectures , 2011, ICS '11.

[15] Nikhil Bansal,et al. A logarithmic approximation for unsplittable flow on line graphs , 2014, TALG.

[16] Paul D. Seymour,et al. Graph Minors. XX. Wagner's conjecture , 2004, J. Comb. Theory B.

[17] David J. DeWitt,et al. Practical Skew Handling in Parallel Joins , 1992, VLDB.

[18] Alfons Kemper,et al. Flow-Join: Adaptive skew handling for distributed joins over high-speed networks , 2016, 2016 IEEE 32nd International Conference on Data Engineering (ICDE).

[19] Dan Suciu,et al. A Worst-Case Optimal Multi-Round Algorithm for Parallel Computation of Conjunctive Queries , 2017, PODS.

[20] Harald Räcke,et al. Minimizing Congestion in General Networks , 2002, FOCS.

[21] Xinyan Deng,et al. Submodularity of Distributed Join Computation , 2018, SIGMOD Conference.

[22] Laxmikant V. Kalé,et al. Avoiding hot-spots on two-level direct networks , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[23] Carsten Binnig,et al. The End of Slow Networks: It's Time for a Redesign , 2015, Proc. VLDB Endow..

[24] Dan Suciu,et al. Skew in parallel query processing , 2014, PODS.

[25] Richard Cole,et al. Parallel merge sort , 1988, 27th Annual Symposium on Foundations of Computer Science (sfcs 1986).

[26] Anastasios Sidiropoulos,et al. Chasing Similarity: Distribution-aware Aggregation Scheduling , 2018, Proc. VLDB Endow..

[27] Feilong Liu,et al. Design and Evaluation of an RDMA-aware Data Shuffling Operator for Parallel Database Systems , 2017, EuroSys.

[28] Sanjeev Khanna,et al. Edge-disjoint paths in Planar graphs with constant congestion , 2006, STOC '06.

[29] Per-Åke Larson,et al. Data reduction by partial preaggregation , 2002, Proceedings 18th International Conference on Data Engineering.

[30] Michael T. Goodrich,et al. Communication-Efficient Parallel Sorting , 1999, SIAM J. Comput..

[31] Satish Rao,et al. Shallow excluded minors and improved graph decompositions , 1994, SODA '94.

[32] Wei Hong,et al. Proceedings of the 5th Symposium on Operating Systems Design and Implementation Tag: a Tiny Aggregation Service for Ad-hoc Sensor Networks , 2022 .