Market-based cluster resource management

The thesis of this work is that by applying economic metaphors to cluster resource management, significantly more value can be delivered to users compared to traditional approaches. We hypothesize that explicit resource valuations and price feedback can lead to substantially higher value delivered to users. To support our thesis, we demonstrate both feasibility through practical implementations of prototype systems and quantitative advantages through use of extensive simulations. For feasibility, we present a system architecture for market-based cluster resource management and describe two prototype implementations that realize this architecture, one for time-shared environments, the other for batch environments. Both were deployed on subsets of a 300+ CPU cluster of SMPs and subjected to use by a real user community. To quantify the benefits of market-based systems in a systematic fashion, we perform an extensive set of simulations of a 32-node cluster. We analyze performance sensitivity to distributions on resource valuations and distributions on workload parameters including job arrival processes, job resource requirements, workload burstiness, and utilization among others. Using these results, we pinpoint the types of workloads and the regimes of operation in which market-based systems have the most impact. Results show that market-based systems deliver substantially higher value for both time-shared systems and batch systems compared to traditional approaches. For time-shared systems, we observe that explicit resource valuations are a huge benefit and see substantial improvements in value delivered (2x–5x for typical workloads) when resource valuations are known as opposed to when they are not known. On the other hand, we observe that price feedback is only moderately effective, yielding at most a 30% improvement in value delivered, since knowing valuations already provides systems with enough information to focus on high valued uses of resources during contention. For batch systems, we observe that knowing valuations leads to large performance improvements similar to the time-shared case when valuations are known. On the other hand, we observe that adequately parameterized supercomputing center policies with multiple FIFO queues with different priorities are almost as effective as a first price auction with time-varying bids for sequential jobs but substantially less effective for parallel workloads. (Abstract shortened by UMI.)