Cost-Aware Cooperative Resource Provisioning for Heterogeneous Workloads in Data Centers

Recent cost analysis shows that the server cost still dominates the total cost of high-scale data centers or cloud systems. In this paper, we argue for a new twist on the classical resource provisioning problem: heterogeneous workloads are a fact of life in large-scale data centers, and current resource provisioning solutions do not act upon this heterogeneity. Our contributions are threefold: first, we propose a cooperative resource provisioning solution, and take advantage of differences of heterogeneous workloads so as to decrease their peak resources consumption under competitive conditions; second, for four typical heterogeneous workloads: parallel batch jobs, web servers, search engines, and MapReduce jobs, we build an agile system PhoenixCloud that enables cooperative resource provisioning; and third, we perform a comprehensive evaluation for both real and synthetic workload traces. Our experiments show that our solution could save the server cost aggressively with respect to the noncooperative solutions that are widely used in state-of-the-practice hosting data centers or cloud systems: for example, EC2, which leverages the statistical multiplexing technique, or RightScale, which roughly implements the elastic resource provisioning technique proposed in related state-of-the-art work.

[1]  Dan Meng,et al.  Phoenix Cloud: Consolidating Different Computing Loads on Shared Cluster System for Large Organization , 2010 .

[2]  Dan Meng,et al.  Easy and reliable cluster management: the self-management experience of Fire Phoenix , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[3]  Benny Rochwerger,et al.  Oceano-SLA based management of a computing utility , 2001, 2001 IEEE/IFIP International Symposium on Integrated Network Management Proceedings. Integrated Network Management VII. Integrated Management Strategies for the New Millennium (Cat. No.01EX470).

[4]  Bin Cong,et al.  Scalable Parallel Computing: Technology, Architecture, Programming , 1999, Scalable Comput. Pract. Exp..

[5]  Yves Robert,et al.  Scheduling Concurrent Bag-of-Tasks Applications on Heterogeneous Platforms , 2010, IEEE Transactions on Computers.

[6]  P. Campegiani A Genetic Algorithm to Solve the Virtual Machines Resources Allocation Problem in Multi-tier Distributed Systems , 2009 .

[7]  David E. Irwin,et al.  Dynamic virtual clusters in a grid site manager , 2003, High Performance Distributed Computing, 2003. Proceedings. 12th IEEE International Symposium on.

[8]  David E. Culler,et al.  Operating Systems Support for Planetary-Scale Network Services , 2004, NSDI.

[9]  Luiz André Barroso,et al.  The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines , 2009, The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines.

[10]  Scott Shenker,et al.  Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling , 2010, EuroSys '10.

[11]  Chunjie Luo,et al.  High Volume Throughput Computing: Identifying and Characterizing Throughput Oriented Workloads in Data Centers , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.

[12]  Yi Liang,et al.  In Cloud, Can Scientific Communities Benefit from the Economies of Scale? , 2010, IEEE Transactions on Parallel and Distributed Systems.

[13]  WangLei,et al.  In Cloud, Can Scientific Communities Benefit from the Economies of Scale? , 2012 .

[14]  James R. Hamilton,et al.  Internet-scale service infrastructure efficiency , 2009, ISCA '09.

[15]  Ninghui Sun,et al.  High Volume Computing : Identifying and Characterizing Throughput Oriented Workloads in Data Centers , 2013 .

[16]  José E. Moreira,et al.  True value: assessing and optimizing the cost of computing at the data center level , 2009, CF '09.

[17]  Anand Sivasubramaniam,et al.  Power Consumption Prediction and Power-Aware Packing in Consolidated Environments , 2010, IEEE Transactions on Computers.

[18]  Muli Ben-Yehuda,et al.  The Reservoir model and architecture for open federated cloud computing , 2009, IBM J. Res. Dev..

[19]  G. Sudha,et al.  Design and Implementation of a Two Level Scheduler for HADOOP Data Grids , 2010 .

[20]  Eric A. Brewer,et al.  Cluster-based scalable network services , 1997, SOSP.

[21]  Randy H. Katz,et al.  Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center , 2011, NSDI.

[22]  Tarek A. El-Ghazawi,et al.  Performance evaluation of selected job management systems , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.

[23]  Miron Livny,et al.  Scheduling Mixed Workloads in Multi-grids: The Grid Execution Hierarchy , 2006, 2006 15th IEEE International Conference on High Performance Distributed Computing.

[24]  UrgaonkarBhuvan,et al.  Resource overbooking and application profiling in shared hosting platforms , 2002 .

[25]  Phil Andrews,et al.  Impact of Reservations on Production Job Scheduling , 2007, JSSPP.

[26]  Borja Sotomayor,et al.  Virtual Infrastructure Management in Private and Hybrid Clouds , 2009, IEEE Internet Computing.

[27]  Akshat Verma,et al.  Power-aware dynamic placement of HPC applications , 2008, ICS '08.

[28]  Malgorzata Steinder,et al.  Server virtualization in autonomic management of heterogeneous workloads , 2007, Integrated Network Management.

[29]  Jordi Torres,et al.  Characterizing Cloud Federation for Enhancing Providers' Profit , 2010, 2010 IEEE 3rd International Conference on Cloud Computing.

[30]  Alvin AuYoung,et al.  Service contracts and aggregate utility functions , 2006, 2006 15th IEEE International Conference on High Performance Distributed Computing.

[31]  Yi Liang,et al.  In cloud, do MTC or HTC service providers benefit from the economies of scale? , 2009, MTAGS '09.

[32]  José E. Moreira,et al.  The Case for Full-Throttle Computing: An Alternative Datacenter Design Strategy , 2010, IEEE Micro.

[33]  Jeffrey S. Chase,et al.  Weighted fair sharing for dynamic virtual clusters , 2008, SIGMETRICS '08.

[34]  Martin Arlitt,et al.  Workload Characterization of the 1998 World Cup Web Site , 1999 .

[35]  William E. Weihl,et al.  Lottery scheduling: flexible proportional-share resource management , 1994, OSDI '94.

[36]  Wolf-Dietrich Weber,et al.  Power provisioning for a warehouse-sized computer , 2007, ISCA '07.

[37]  Werner Vogels,et al.  Beyond Server Consolidation , 2008, ACM Queue.

[38]  Jianfeng Zhan,et al.  Fire Phoenix Cluster Operating System Kernel and its Evaluation , 2005, 2005 IEEE International Conference on Cluster Computing.

[39]  Wei Jin,et al.  Interposed proportional sharing for a storage service utility , 2004, SIGMETRICS '04/Performance '04.

[40]  Peter A. Dinda,et al.  VSched: Mixing Batch And Interactive Virtual Machines Using Periodic Real-time Scheduling , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[41]  Andrew V. Goldberg,et al.  Quincy: fair scheduling for distributed computing clusters , 2009, SOSP '09.

[42]  Fermín Galán Márquez,et al.  From infrastructure delivery to service management in clouds , 2010, Future Gener. Comput. Syst..

[43]  Kang G. Shin,et al.  Automated control of multiple virtualized resources , 2009, EuroSys '09.

[44]  Weisong Shi,et al.  PhoenixCloud: Provisioning Runtime Environments for Heterogeneous Cloud Workloads , 2010 .

[45]  David E. Irwin,et al.  Sharing Networked Resources with Brokered Leases , 2006, USENIX Annual Technical Conference, General Track.

[46]  Anand Sivasubramaniam,et al.  Statistical profiling-based techniques for effective power provisioning in data centers , 2009, EuroSys '09.

[47]  Gang Lu,et al.  Characterization of real workloads of web search engines , 2011, 2011 IEEE International Symposium on Workload Characterization (IISWC).

[48]  Amin Vahdat,et al.  Managing energy and server resources in hosting centers , 2001, SOSP.

[49]  Waheed Iqbal,et al.  SLA-Driven Dynamic Resource Management for Multi-tier Web Applications in a Cloud , 2010, 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing.

[50]  Dan Meng,et al.  Transformer: A New Paradigm for Building Data-Parallel Programming Models , 2010, IEEE Micro.

[51]  Dongyan Xu,et al.  VioCluster: Virtualization for Dynamic Computational Domains , 2005, 2005 IEEE International Conference on Cluster Computing.

[52]  Xiaona Li,et al.  BigDataBench: a Big Data Benchmark Suite from Web Search Engines , 2013, ArXiv.

[53]  Jose Renato Santos,et al.  JustRunIt: Experiment-Based Management of Virtualized Data Centers , 2009, USENIX Annual Technical Conference.

[54]  Scott A. Brandt,et al.  Draco: Efficient Resource Management for Resource-Constrained Control Tasks , 2009, IEEE Transactions on Computers.

[55]  Timothy Roscoe,et al.  Resource overbooking and application profiling in shared hosting platforms , 2002, OSDI '02.

[56]  Martin Arlitt,et al.  A workload characterization study of the 1998 World Cup Web site , 2000, IEEE Netw..

[57]  Borja Sotomayor,et al.  Combining batch execution and leasing using virtual machines , 2008, HPDC '08.

[58]  Amip J. Shah,et al.  Cost Model for Planning, Development and Operation of a Data Center , 2005 .

[59]  Douglas Thain,et al.  Distributed computing in practice: the Condor experience , 2005, Concurr. Pract. Exp..