Merkat: Market-based Autonomous Application and Resource Management in the Cloud Merkat: Market-based Autonomous Application and Resource Management in the Cloud

Organizations owning High Performance Computing (HPC) infrastructures are facing difficulties in managing their resources. These difficulties come from the need to provide concurrent resource access to applications with different resource requirements while considering that users are selfish and might have different performance objectives, or Service Level Objectives (SLOs), when executing them. To address these challenges, this paper proposes Merkat, a market-based SLO-driven cloud platform. Merkat relies on a market-based model to allocate resources to ap- plications while taking advantage of virtualization technologies and on-demand provisioning to maximize resource utilization. Merkat’s resource market uses a combination of currency distribu- tion and dynamic resource pricing to ensure proper resource distribution while decentralizing the resource control. In Merkat autonomous controllers apply adaptation policies to scale the appli- cation’s resource demand according to user’s SLO. The adaptation policies can: (i) dynamically tune the amount of CPU and memory provisioned for the virtual machines in contention periods; (ii) dynamically change the number of virtual machines. We evaluated this proposed platform in simulation and on the Grid’5000 testbed. Results show that: (i) Merkat provides flexible suport for different application types and different SLOs; (ii) it increases the resource utilization of the infras- tructure; (iii) and is capable of providing good user satisfaction compared to existing centralized systems.

[1]  Larry Rudolph,et al.  Gang Scheduling Performance Benefits for Fine-Grain Synchronization , 1992, J. Parallel Distributed Comput..

[2]  Scott Devine,et al.  Disco: running commodity operating systems on scalable multiprocessors , 1997, TOCS.

[3]  A. D. Meglio,et al.  Programming the Grid with gLite , 2006 .

[4]  Larry Rudolph,et al.  Towards Convergence in Job Schedulers for Parallel Supercomputers , 1996, JSSPP.

[5]  David P. Anderson,et al.  BOINC: a system for public-resource computing and storage , 2004, Fifth IEEE/ACM International Workshop on Grid Computing.

[6]  Rajkumar Buyya,et al.  Article in Press Future Generation Computer Systems ( ) – Future Generation Computer Systems Autonomic Metered Pricing for a Utility Computing Service , 2022 .

[7]  J. Wilkes Utility Functions, Prices, and Negotiation , 2009 .

[8]  Michael P. Wellman,et al.  The WALRAS Algorithm: A Convergent Distributed Implementation of General Equilibrium Outcomes , 1998 .

[9]  Guillaume Pierre,et al.  ConPaaS: A Platform for Hosting Elastic Cloud Applications , 2012, IEEE Internet Computing.

[10]  Richard Wolski,et al.  The Eucalyptus Open-Source Cloud-Computing System , 2009, 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid.

[11]  Paul Marshall,et al.  Elastic Site: Using Clouds to Elastically Extend Site Resources , 2010, 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing.

[12]  Randy H. Katz,et al.  Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center , 2011, NSDI.

[13]  Thorsten Schütt,et al.  ConPaaS: an integrated runtime environment for elastic cloud applications , 2011, PDT '11.

[14]  Cristian Klein,et al.  An RMS for Non-predictably Evolving Applications , 2011, 2011 IEEE International Conference on Cluster Computing.

[15]  Francine Berman,et al.  The AppLeS Parameter Sweep Template: User-Level Middleware for the Grid , 2000, ACM/IEEE SC 2000 Conference (SC'00).

[16]  Andrei Tsaregorodtsev,et al.  DIRAC pilot framework and the DIRAC Workload Management System , 2010 .

[17]  Stefania Costache,et al.  An Economic Approach for Application QoS Management in Clouds , 2011, Euro-Par Workshops.

[18]  Tal Garfinkel,et al.  The Design and Evolution of Live Storage Migration in VMware ESX , 2011, USENIX Annual Technical Conference.

[19]  Dongyan Xu,et al.  VioCluster: Virtualization for Dynamic Computational Domains , 2005, 2005 IEEE International Conference on Cluster Computing.

[20]  Thomas Sandholm,et al.  Dynamic Proportional Share Scheduling in Hadoop , 2010, JSSPP.

[21]  Chaki Ng,et al.  Mirage: a microeconomic resource allocation system for sensornet testbeds , 2005, The Second IEEE Workshop on Embedded Networked Sensors, 2005. EmNetS-II..

[22]  Rajesh Sudarsan,et al.  ReSHAPE: A Framework for Dynamic Resizing and Scheduling of Homogeneous Applications in a Parallel Environment , 2007, 2007 International Conference on Parallel Processing (ICPP 2007).

[23]  Daniel C. Stanzione,et al.  Dynamic Virtual Clustering , 2007, 2007 IEEE International Conference on Cluster Computing.

[24]  Andrew Warfield,et al.  Xen and the art of virtualization , 2003, SOSP '03.

[25]  Christian Engelmann,et al.  Proactive fault tolerance for HPC with Xen virtualization , 2007, ICS '07.

[26]  Renato J. O. Figueiredo,et al.  A case for grid computing on virtual machines , 2003, 23rd International Conference on Distributed Computing Systems, 2003. Proceedings..

[27]  M. Annaratone MPPs, Amdahl's law, and comparing computers , 1992, [Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation.

[28]  Katarzyna Keahey,et al.  Contextualization: Providing One-Click Virtual Clusters , 2008, 2008 IEEE Fourth International Conference on eScience.

[29]  Dana Petcu,et al.  Building a Mosaic of Clouds , 2010, Euro-Par Workshops.

[30]  Michael Abd-El-Malek,et al.  Omega: flexible, scalable schedulers for large compute clusters , 2013, EuroSys '13.

[31]  Konrad Campowsky,et al.  BonFIRE: A Multi-cloud Test Facility for Internet of Services Experimentation , 2012, TRIDENTCOM.

[32]  F. Archambeau,et al.  Code Saturne: A Finite Volume Code for the computation of turbulent incompressible flows - Industrial Applications , 2004 .

[33]  Jeff Chase,et al.  Self-recharging virtual currency , 2005, P2PECON '05.

[34]  Carsten Kutzner,et al.  GROMACS 4:  Algorithms for Highly Efficient, Load-Balanced, and Scalable Molecular Simulation. , 2008, Journal of chemical theory and computation.

[35]  Rajkumar Buyya,et al.  InterCloud: Utility-Oriented Federation of Cloud Computing Environments for Scaling of Application Services , 2010, ICA3PP.

[36]  Douglas Thain,et al.  Distributed computing in practice: the Condor experience , 2005, Concurr. Pract. Exp..

[37]  James E. Smith,et al.  The architecture of virtual machines , 2005, Computer.

[38]  A. Kivity,et al.  kvm : the Linux Virtual Machine Monitor , 2007 .

[39]  Borja Sotomayor,et al.  Combining batch execution and leasing using virtual machines , 2008, HPDC '08.

[40]  Alan Su,et al.  Diet: New Developments and Recent Results , 2006, Euro-Par Workshops.

[41]  Rusty Russell,et al.  virtio: towards a de-facto standard for virtual I/O devices , 2008, OPSR.

[42]  Moustafa Ghanem,et al.  Lightweight Resource Scaling for Cloud Applications , 2012, 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012).

[43]  Rizos Sakellariou,et al.  Enacting SLAs in Clouds Using Rules , 2011, Euro-Par.

[44]  Carlo Curino,et al.  Apache Hadoop YARN: yet another resource negotiator , 2013, SoCC.

[45]  Cristian Klein,et al.  An RMS Architecture for Efficiently Supporting Complex-Moldable Applications , 2011, 2011 IEEE International Conference on High Performance Computing and Communications.

[46]  P. Sadayappan,et al.  Characterization of backfilling strategies for parallel job scheduling , 2002, Proceedings. International Conference on Parallel Processing Workshop.

[47]  David E. Irwin,et al.  Dynamic virtual clusters in a grid site manager , 2003, High Performance Distributed Computing, 2003. Proceedings. 12th IEEE International Symposium on.

[48]  Jeffrey S. Chase,et al.  Extensible resource management for networked virtual computing , 2007 .

[49]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[50]  Stefania Costache,et al.  Themis: Economy-based Automatic Resource Scaling for Cloud Systems , 2012, 2012 IEEE 14th International Conference on High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems.

[51]  Hector Garcia-Molina,et al.  Bidding for storage space in a peer-to-peer data preservation system , 2002, Proceedings 22nd International Conference on Distributed Computing Systems.

[52]  Gregory A. Koenig,et al.  Maestro-VC: a paravirtualized execution environment for secure on-demand cluster computing , 2006 .

[53]  Sang-Min Park,et al.  Self-Tuning Virtual Machines for Predictable eScience , 2009, 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid.

[54]  Li Zhang,et al.  Tycoon: An implementation of a distributed, market-based resource allocation system , 2004, Multiagent Grid Syst..

[55]  Kevin Lai,et al.  Markets are dead, long live markets , 2005, SECO.

[56]  Xavier Lorca,et al.  Entropy: a consolidation manager for clusters , 2009, VEE '09.

[57]  William E. Weihl,et al.  Lottery scheduling: flexible proportional-share resource management , 1994, OSDI '94.

[58]  Daniel Grosu,et al.  Combinatorial Auction-Based Allocation of Virtual Machine Instances in Clouds , 2010, 2010 IEEE Second International Conference on Cloud Computing Technology and Science.

[59]  Hussein M. Abdel-Wahab,et al.  A Microeconomic Scheduler for Parallel Computers , 1995, JSSPP.

[60]  Xiaohui Gu,et al.  CloudScale: elastic resource scaling for multi-tenant cloud systems , 2011, SoCC.

[61]  L. Ramakrishnan,et al.  Toward a Doctrine of Containment: Grid Hosting with Adaptive Resource Control , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[62]  Muli Ben-Yehuda,et al.  Ginseng: market-driven memory allocation , 2014, VEE '14.

[63]  Rajkumar Buyya,et al.  A taxonomy of market-based resource management systems for utility-driven cluster computing , 2006 .

[64]  Rajkumar Buyya,et al.  The Aneka platform and QoS-driven resource provisioning for elastic applications on hybrid Clouds , 2012, Future Gener. Comput. Syst..

[65]  David Abramson,et al.  Economic models for resource management and scheduling in Grid computing , 2002, Concurr. Comput. Pract. Exp..

[66]  Henri Casanova,et al.  Dynamic fractional resource scheduling for HPC workloads , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[67]  Xiaoyun Zhu,et al.  1000 islands: an integrated approach to resource management for virtualized data centers , 2009, Cluster Computing.

[68]  Ian T. Foster,et al.  Virtual workspaces: Achieving quality of service and quality of life in the Grid , 2005, Sci. Program..

[69]  Gianluigi Zanetti,et al.  Suspending, migrating and resuming HPC virtual clusters , 2010, Future Gener. Comput. Syst..

[70]  Toby Velte,et al.  Microsoft Virtualization with Hyper-V , 2009 .

[71]  Dhabaleswar K. Panda,et al.  A case for high performance computing with virtual machines , 2006, ICS '06.

[72]  Ivan E. Sutherland,et al.  A futures market in computer time , 1968, Commun. ACM.

[73]  Laura Ricci,et al.  Cloud Federations in Contrail , 2011, Euro-Par Workshops.

[74]  Noam Nisan,et al.  The POPCORN market. Online markets for computational resources , 2000, Decis. Support Syst..

[75]  Xiaoyun Zhu,et al.  AppRAISE: application-level performance management in virtualized server environments , 2009, IEEE Transactions on Network and Service Management.

[76]  Thomas Sandholm,et al.  Admission Control in a Computational Market , 2008, 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGRID).

[77]  Rajkumar Buyya,et al.  CloudSim: a toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms , 2011, Softw. Pract. Exp..

[78]  Alexander Papaspyrou,et al.  Open cloud computing interface : core , 2011 .

[79]  Olivier Richard,et al.  TakTuk, adaptive deployment of remote executions , 2009, HPDC '09.

[80]  Dejan S. Milojicic,et al.  Open Cirrus: A Global Cloud Computing Testbed , 2010, Computer.

[81]  George Candea,et al.  OnCall: defeating spikes with a free-market application cluster , 2004 .

[82]  Jean-Marc Menaud,et al.  Autonomic virtual resource management for service hosting platforms , 2009, 2009 ICSE Workshop on Software Engineering Challenges of Cloud Computing.

[83]  Amnon Barak,et al.  The MOSIX Distributed Operating System: Load Balancing for UNIX , 1993 .

[84]  Franck Cappello,et al.  Grid'5000: A Large Scale And Highly Reconfigurable Experimental Grid Testbed , 2006, Int. J. High Perform. Comput. Appl..

[85]  Michael Stonebraker,et al.  Mariposa: a wide-area distributed database system , 1996, The VLDB Journal.

[86]  David E. Culler,et al.  REXEC: A Decentralized, Secure Remote Execution Environment for Clusters , 2000, CANPC.

[87]  Miron Livny,et al.  Condor-a hunter of idle workstations , 1988, [1988] Proceedings. The 8th International Conference on Distributed.

[88]  Sathish S. Vadhiyar,et al.  A metascheduler for the Grid , 2002, Proceedings 11th IEEE International Symposium on High Performance Distributed Computing.

[89]  Rajkumar Buyya,et al.  Libra: a computational economy‐based job scheduling system for clusters , 2004, Softw. Pract. Exp..

[90]  Stefania Costache,et al.  Merkat: A Market-Based SLO-Driven Cloud Platform , 2013, 2013 IEEE 5th International Conference on Cloud Computing Technology and Science.

[91]  David E. Culler,et al.  Operating Systems Support for Planetary-Scale Network Services , 2004, NSDI.

[92]  Thomas Sandholm,et al.  A statistical approach to risk mitigation in computational markets , 2007, HPDC '07.

[93]  Xuxian Jiang,et al.  Virtual distributed environments in a shared infrastructure , 2005, Computer.

[94]  Borja Sotomayor,et al.  Virtual Infrastructure Management in Private and Hybrid Clouds , 2009, IEEE Internet Computing.

[95]  Christian Engelmann,et al.  Proactive Fault Tolerance Using Preemptive Migration , 2009, 2009 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing.

[96]  Rajkumar Buyya,et al.  Pricing for Utility-Driven Resource Management and Allocation in Clusters , 2007, Int. J. High Perform. Comput. Appl..

[97]  Garrick Staples,et al.  TORQUE resource manager , 2006, SC.

[98]  Stefania Costache,et al.  On the Use of a Proportional-Share Market for Application SLO Support in Clouds , 2013, Euro-Par.

[99]  David Abramson,et al.  A Computational Economy for Grid Computing and its Implementation in the Nimrod-G Resource Brok , 2001, Future Gener. Comput. Syst..

[100]  Daniel A. Menascé,et al.  Resource Allocation for Autonomic Data Centers using Analytic Performance Models , 2005, Second International Conference on Autonomic Computing (ICAC'05).

[101]  Leendert van Doorn,et al.  Hardware virtualization trends , 2006, VEE '06.

[102]  Dror G. Feitelson,et al.  The workload on parallel supercomputers: modeling the characteristics of rigid jobs , 2003, J. Parallel Distributed Comput..

[103]  Carsten Franke,et al.  XtreemOS: A Vision for a Grid Operating System , 2008 .

[104]  Jean-Marc Menaud,et al.  Performance and Power Management for Cloud Infrastructures , 2010, 2010 IEEE 3rd International Conference on Cloud Computing.

[105]  Cynthia Bailey Lee,et al.  Precise and realistic utility functions for user-centric performance analysis of schedulers , 2007, HPDC '07.

[106]  Ian Foster,et al.  The Grid 2 - Blueprint for a New Computing Infrastructure, Second Edition , 1998, The Grid 2, 2nd Edition.

[107]  Dongyan Xu,et al.  Autonomic Live Adaptation of Virtual Computational Environments in a Multi-Domain Infrastructure , 2006, 2006 IEEE International Conference on Autonomic Computing.

[108]  Arun Venkataramani,et al.  Sandpiper: Black-box and gray-box resource management for virtual machines , 2009, Comput. Networks.

[109]  Christine Morin,et al.  Kerrighed: A Single System Image Cluster Operating System for High Performance Computing , 2003, Euro-Par.

[110]  Richard Wolski,et al.  G-commerce: market formulations controlling resource allocation on the computational grid , 2001, Proceedings 15th International Parallel and Distributed Processing Symposium. IPDPS 2001.