论文信息 - Resource Management for Parallel Processing Frameworks with Load Awareness at Worker Side

Resource Management for Parallel Processing Frameworks with Load Awareness at Worker Side

Many resource management systems and large-scale data processing frameworks use a reservation-based model for managing resources and scheduling tasks. We observe from the reported traces of Facebook and Google that this model leads to resource being wasted because the tasks do not use effectively the allocated resources. We confirm the problem with a trace of our production cluster. We propose an algorithm to estimate the resource usage at worker nodes. This estimation is used as an input for the scheduler at the resource manager. We verify the stability of the new system in a simulator and develop a prototype of this approach for YARN. Our results in the simulator show that the new model can flexibly match the actual demand of the workload to the capacity of the cluster avoiding resources over-reserved by users. Comparing the worst scenario of our management model and the best scenario of the reservation model, we obtain almost the same performance and comparable system stability. In practice, our prototype for YARN completes jobs faster from 23% to 44%.

[1] Randy H. Katz,et al. Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center , 2011, NSDI.

[2] Srikanth Kandula,et al. Multi-resource packing for cluster schedulers , 2014, SIGCOMM.

[3] Randy H. Katz,et al. Heterogeneity and dynamicity of clouds at scale: Google trace analysis , 2012, SoCC '12.

[4] Srikanth Kandula,et al. Efficient queue management for cluster scheduling , 2016, EuroSys.

[5] Anne-Marie Kermarrec,et al. Hawk: Hybrid Datacenter Scheduling , 2015, USENIX Annual Technical Conference.

[6] Willy Zwaenepoel,et al. Eagle : A Better Hybrid Data Center Scheduler , 2016 .

[7] Chen Wang,et al. Clash of the Titans: MapReduce vs. Spark for Large Scale Data Analytics , 2015, Proc. VLDB Endow..

[8] Carlo Curino,et al. Reservation-based Scheduling: If You're Late Don't Blame Us! , 2014, SoCC.

[9] Carlo Curino,et al. Apache Hadoop YARN: yet another resource negotiator , 2013, SoCC.

[10] Abhishek Verma,et al. Large-scale cluster management at Google with Borg , 2015, EuroSys.

[11] Michael Abd-El-Malek,et al. Omega: flexible, scalable schedulers for large compute clusters , 2013, EuroSys '13.

[12] Benjamin Hindman,et al. Dominant Resource Fairness: Fair Allocation of Heterogeneous Resources in Datacenters , 2010 .