Quantifying the Brown Side of Priority Schedulers: Lessons from Big Clusters

Scheduling is a central operation to achieve "green" data centers, i.e., distributing diversified workloads across heterogeneous resources in an energy efficient manner. Taking an opposite perspective from most of the related work, this paper reveals the "brown" side of scheduling, i.e., wasted core seconds (so called brown resources), using fleld analysis and trace-driven simulation of a Google cluster trace. First, based on the trace, we pinpoint the dependency between priority scheduling and task eviction that causes brown resources and present a brief characterization study focusing on workload priorities. Next, to better understand and further reduce the resource "inefficiency" of priority scheduling, we develop a slot-based scheduler and simulator with various system tunable parameters. Our key finding is that tasks of low priority suffer greatly in terms of response time as well as CPU resources because of a high probability of being evicted and resubmitted. We propose to use simple threshold-based policies that consider the trade-off between task drop rates and wasted core seconds due to task resubmission due to eviction. Our experimental results show that we are able to effectively mitigate brown resources without sacrificing the performance advantages of priority scheduling.