Abstract cost models for distributed data-intensive computations

We consider data analytics workloads on distributed architectures, in particular clusters of commodity machines. To find a job partitioning that minimizes running time, a cost model, which we more accurately refer to as makespan model, is needed. In attempting to find the simplest possible, but sufficiently accurate, such model, we explore piecewise linear functions of input, output, and computational complexity. They are abstract in the sense that they capture fundamental algorithm properties, but do not require explicit modeling of system and implementation details such as the number of disk accesses. We show how the simplified functional structure can be exploited to reduce optimization cost. In the general case, we identify a lower bound that can be used for search-space pruning. For applications with homogeneous tasks, we further demonstrate how to directly integrate the model into the makespan optimization process, reducing search-space dimensionality and thus complexity by orders of magnitude. Experimental results provide evidence of good prediction quality and successful makespan optimization across a variety of operators and cluster architectures.

[1]  Ramesh C. Agarwal,et al.  A three-dimensional approach to parallel matrix multiplication , 1995, IBM J. Res. Dev..

[2]  Jeffrey F. Naughton,et al.  Resource bricolage and resource selection for parallel database systems , 2017, The VLDB Journal.

[3]  Dror Irony,et al.  Communication lower bounds for distributed-memory matrix multiplication , 2004, J. Parallel Distributed Comput..

[4]  Mike Hibler,et al.  An integrated experimental environment for distributed systems and networks , 2002, OPSR.

[5]  Eli Upfal,et al.  Performance prediction for concurrent database workloads , 2011, SIGMOD '11.

[6]  Guoping Wang,et al.  Multi-Query Optimization in MapReduce Framework , 2013, Proc. VLDB Endow..

[7]  Jordi Torres,et al.  Dynamic Configuration of Partitioning in Spark Applications , 2017, IEEE Transactions on Parallel and Distributed Systems.

[8]  Anthony Skjellum,et al.  A framework for high‐performance matrix multiplication based on hierarchical abstractions, algorithms and optimized low‐level kernels , 2002, Concurr. Comput. Pract. Exp..

[9]  James Demmel,et al.  Communication-Optimal Parallel 2.5D Matrix Multiplication and LU Factorization Algorithms , 2011, Euro-Par.

[10]  Chen Wang,et al.  MRTuner: A Toolkit to Enable Holistic Optimization for MapReduce Jobs , 2014, Proc. VLDB Endow..

[11]  Eli Upfal,et al.  Contender: A Resource Modeling Approach for Concurrent Query Performance Prediction , 2014, EDBT.

[12]  C. J. Hahn,et al.  Extended Edited Synoptic Cloud Reports from Ships and Land Stations Over the Globe, 1952-1996 , 1999 .

[13]  Sanjay Chawla,et al.  A Cost-based Optimizer for Gradient Descent Optimization , 2017, SIGMOD Conference.

[14]  Min Wang,et al.  Efficient Multi-way Theta-Join Processing Using MapReduce , 2012, Proc. VLDB Endow..

[15]  Xinyan Deng,et al.  Submodularity of Distributed Join Computation , 2018, SIGMOD Conference.

[16]  E. Vieth Fitting piecewise linear regression functions to biological responses. , 1989, Journal of applied physiology.

[17]  Archana Ganapathi,et al.  Predicting Multiple Metrics for Queries: Better Decisions Enabled by Machine Learning , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[18]  Ronald C. Arkin,et al.  The case for banning killer robots , 2015, Commun. ACM.

[19]  James Demmel,et al.  Communication optimal parallel multiplication of sparse random matrices , 2013, SPAA.

[20]  Erik Elmroth,et al.  SIAM REVIEW c ○ 2004 Society for Industrial and Applied Mathematics Vol. 46, No. 1, pp. 3–45 Recursive Blocked Algorithms and Hybrid Data Structures for Dense Matrix Library Software ∗ , 2022 .

[21]  Jeffrey F. Naughton,et al.  Towards Predicting Query Execution Time for Concurrent and Dynamic Database Workloads , 2013, Proc. VLDB Endow..

[22]  Ion Stoica,et al.  Ernest: Efficient Performance Prediction for Large-Scale Advanced Analytics , 2016, NSDI.

[23]  Raghu Ramakrishnan,et al.  Database Management Systems , 1976 .

[24]  C. J. Hahn,et al.  Extended edited synoptic cloud reports from ships and land stations over the globe , 1999 .

[25]  Herodotos Herodotou,et al.  Profiling, what-if analysis, and cost-based optimization of MapReduce programs , 2011, Proc. VLDB Endow..

[26]  ChenLei,et al.  Efficient multi-way theta-join processing using MapReduce , 2012, VLDB 2012.

[27]  J. R. Quinlan Learning With Continuous Classes , 1992 .

[28]  Robert A. van de Geijn,et al.  SUMMA: scalable universal matrix multiplication algorithm , 1995, Concurr. Pract. Exp..

[29]  Sergei Vassilvitskii,et al.  A model of computation for MapReduce , 2010, SODA '10.

[30]  Shivnath Babu,et al.  Cumulon: optimizing statistical data analysis in the cloud , 2013, SIGMOD '13.

[31]  Qin Zhang,et al.  Sorting, Searching, and Simulation in the MapReduce Framework , 2011, ISAAC.

[32]  James Demmel,et al.  the Parallel Computing Landscape , 2022 .

[33]  Eli Upfal,et al.  Learning-based Query Performance Modeling and Prediction , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[34]  Magdalena Balazinska,et al.  ParaTimer: a progress indicator for MapReduce DAGs , 2010, SIGMOD Conference.