Scheduling of data-intensive workloads in a brokered virtualized environment

Providing performance predictability guarantees is increasingly important in cloud platforms, especially for data-intensive applications, for which performance depends greatly on the available rates of data transfer between the various computing/storage hosts underlying the virtualized resources assigned to the application. With the increased prevalence of brokerage services in cloud platforms, there is a need for resource management solutions that consider the brokered nature of these workloads, as well as the special demands of their intra-dependent components. In this paper, we present an offline mechanism for scheduling batches of brokered data-intensive workloads, which can be extended to an online setting. The objective of the mechanism is to decide on a packing of the workloads in a batch that minimizes the broker’s incurred costs, Moreover, considering the brokered nature of such workloads, we define a payment model that provides incentives to these workloads to be scheduled as part of a batch, which we analyze theoretically. Finally, we evaluate the proposed scheduling algorithm, and exemplify the fairness of the payment model in practical settings via trace-based experiments.

[1]  J. S. Mateo The Shapley Value , 2012 .

[2]  Klara Nahrstedt,et al.  A distributed resource management architecture that supports advance reservations and co-allocation , 1999, 1999 Seventh International Workshop on Quality of Service. IWQoS'99. (Cat. No.98EX354).

[3]  Peter S. Pacheco Parallel programming with MPI , 1996 .

[4]  Manuel Iori,et al.  Metaheuristic Algorithms for the Strip Packing Problem , 2003 .

[5]  Leah Epstein,et al.  Scheduling selfish jobs on multidimensional parallel machines , 2014, Theor. Comput. Sci..

[6]  Graham Kendall,et al.  A New Placement Heuristic for the Orthogonal Stock-Cutting Problem , 2004, Oper. Res..

[7]  Ian T. Foster,et al.  Globus: a Metacomputing Infrastructure Toolkit , 1997, Int. J. High Perform. Comput. Appl..

[8]  Joseph Naor,et al.  Near-optimal scheduling mechanisms for deadline-sensitive jobs in large computing clusters , 2012, SPAA '12.

[9]  Daniel Gómez,et al.  Polynomial calculation of the Shapley value based on sampling , 2009, Comput. Oper. Res..

[10]  Azer Bestavros,et al.  CloudPack - Exploiting Workload Flexibility through Rational Pricing , 2012, Middleware.

[11]  Ronald L. Rivest,et al.  Orthogonal Packings in Two Dimensions , 1980, SIAM J. Comput..

[12]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[13]  Ron Lavi,et al.  Algorithmic Mechanism Design , 2008, Encyclopedia of Algorithms.

[14]  Helen J. Wang,et al.  SecondNet: a data center network virtualization architecture with bandwidth guarantees , 2010, CoNEXT.

[15]  Amin Vahdat,et al.  Resource Allocation in Federated Distributed Computing Infrastructures , 2004 .

[16]  Angela C. Sodan,et al.  Backfilling with Fairness and Slack for Parallel Job Scheduling , 2010 .

[17]  Peter Steenkiste,et al.  Darwin: customizable resource management for value-added network services , 1998, Proceedings Sixth International Conference on Network Protocols (Cat. No.98TB100256).

[18]  ProdanRadu,et al.  Performance Analysis of Cloud Computing Services for Many-Tasks Scientific Computing , 2011 .

[19]  Asser N. Tantawi,et al.  Enabling Efficient Placement of Virtual Infrastructures in the Cloud , 2012, Middleware.

[20]  Dror G. Feitelson,et al.  Backfilling with Lookahead to Optimize the Performance of Parallel Job Scheduling , 2003, JSSPP.

[21]  A. Rowstron,et al.  Towards predictable datacenter networks , 2011, SIGCOMM.

[22]  Krishna P. Gummadi,et al.  Proceedings of the 2013 conference on Internet measurement conference , 2013, IMC 2013.

[23]  Srikanth Kandula,et al.  Multi-resource packing for cluster schedulers , 2014, SIGCOMM.

[24]  Andrew V. Goldberg,et al.  Quincy: fair scheduling for distributed computing clusters , 2009, SOSP '09.

[25]  Amin Vahdat,et al.  Design and implementation tradeoffs for wide-area resource discovery , 2005, HPDC-14. Proceedings. 14th IEEE International Symposium on High Performance Distributed Computing, 2005..

[26]  Harald Dyckhoff,et al.  A typology of cutting and packing problems , 1990 .

[27]  Andreas Bortfeldt,et al.  A genetic algorithm for the two-dimensional strip packing problem with rectangular pieces , 2006, Eur. J. Oper. Res..

[28]  Alexandru Iosup,et al.  Performance Analysis of Cloud Computing Services for Many-Tasks Scientific Computing , 2011, IEEE Transactions on Parallel and Distributed Systems.

[29]  Jan H. van Vuuren,et al.  New and improved level heuristics for the rectangular strip packing and variable-sized bin packing problems , 2010, Eur. J. Oper. Res..

[30]  David A. Maltz,et al.  Surviving failures in bandwidth-constrained datacenters , 2012, CCRV.

[31]  Robert E. Tarjan,et al.  Performance Bounds for Level-Oriented Two-Dimensional Packing Algorithms , 1980, SIAM J. Comput..

[32]  Azer Bestavros,et al.  Network-Constrained Packing of Brokered Workloads in Virtualized Environments , 2015, 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[33]  Anthony M. Middleton Data-Intensive Technologies for Cloud Computing , 2010, Handbook of Cloud Computing.

[34]  David Abramson,et al.  Nimrod/G: an architecture for a resource management and scheduling system in a global computational grid , 2000, Proceedings Fourth International Conference/Exhibition on High Performance Computing in the Asia-Pacific Region.

[35]  Middleware 2012 , 2012, Lecture Notes in Computer Science.