论文信息 - Performance-Aware Fair Scheduling: Exploiting Demand Elasticity of Data Analytics Jobs

Performance-Aware Fair Scheduling: Exploiting Demand Elasticity of Data Analytics Jobs

Efficient resource management is of paramount importance in today's production clusters. In this paper, we identify the demand elasticity of data-parallel jobs. Demand elasticity allows jobs to run with a significantly less amount of resources than they ideally need, at the expense of only a modest performance penalty. Our EC2 experiment using popular Spark benchmark suites confirms that running a job using 50% of demanded slots is sufficient to achieve at least 75% of the ideal performance. We show that such an elasticity is an intrinsic property of data-parallel jobs and can be exploited to speed up average job completion. In this regard, we propose Performance-Aware Fair (PAF) scheduler to identify the demand elasticity and use it to improve the average job performance, while still attaining near-optimal isolation guarantee close to fair sharing. PAF starts with a fair allocation and iteratively adjusts it by transferring resources from one job to another, improving the performance of resource-taker without penalizing resource-giver by a noticeable amount. We implemented PAF in Spark and evaluated its effectiveness through both EC2 experiments and large-scale simulations. Evaluation results show that compared with fair allocation, PAF improves the average job performance by 13%, while penalizing resource-givers by no more than 1%.

Bo Li | Chen Chen | Wei Wang

[1] Luiz André Barroso,et al. The tail at scale , 2013, CACM.

[2] Scott Shenker,et al. Choosy: max-min fair sharing for datacenter jobs with constraints , 2013, EuroSys '13.

[3] Bo Li,et al. Coflex: Navigating the fairness-efficiency tradeoff for coflow scheduling , 2017, IEEE INFOCOM 2017 - IEEE Conference on Computer Communications.

[4] Scott Shenker,et al. Making Sense of Performance in Data Analytics Frameworks , 2015, NSDI.

[5] Magdalena Balazinska,et al. SkewTune: mitigating skew in mapreduce applications , 2012, SIGMOD Conference.

[6] Randy H. Katz,et al. Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center , 2011, NSDI.

[7] Ion Stoica,et al. Ernest: Efficient Performance Prediction for Large-Scale Advanced Analytics , 2016, NSDI.

[8] Carlo Curino,et al. Apache Hadoop YARN: yet another resource negotiator , 2013, SoCC.

[9] Scott Shenker,et al. Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling , 2010, EuroSys '10.

[10] Roy H. Campbell,et al. ARIA: automatic resource inference and allocation for mapreduce environments , 2011, ICAC '11.

[11] Anirban Dasgupta,et al. On scheduling in map-reduce and flow-shops , 2011, SPAA '11.