D-factor: a quantitative model of application slow-down in multi-resource shared systems

Abstract Scheduling multiple jobs onto a platform enhances system utilization by sharing resources. The benefits from higher resource utilization include reduced cost to construct, operate, and maintain a system, which often include energy consumption. Maximizing these benefits comes at a price-resource contention among jobs increases job completion time. In this paper, we analyze slow-downs of jobs due to contention for multiple resources in a system; referred to as dilation factor. We observe that multiple-resource contention creates non-linear dilation factors of jobs. From this observation, we establish a general quantitative model for dilation factors of jobs in multi-resource systems. A job is characterized by a vector-valued loading statistics and dilation factors of a job set are given by a quadratic function of their loading vectors. We demonstrate how to systematically characterize a job, maintain the data structure to calculate the dilation factor (loading matrix), and calculate the dilation factor of each job. We validate the accuracy of the model with multiple processes running on a native Linux server, virtualized servers, and with multiple MapReduce workloads co-scheduled in a cluster. Evaluation with measured data shows that the D-factor model has an error margin of less than 16%. We extended the D-factor model to capture the slow-down of applications when multiple identical resources exist such as multi-core environments and multi-disks environments. Validation results of the extended D-factor model with HPC checkpoint applications on the parallel file systems show that D-factor accurately captures the slow down of concurrent applications in such environments.

[1]  Aameek Singh,et al.  Server-storage virtualization: Integration and load balancing in data centers , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[2]  Alexandra Fedorova,et al.  Managing Contention for Shared Resources on Multicore Processors , 2010 .

[3]  Kevin Skadron,et al.  Bubble-up: Increasing utilization in modern warehouse scale computers via sensible co-locations , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[4]  Jie Liu,et al.  Cuanta: quantifying effects of shared on-chip resource interference for consolidated virtual machines , 2011, SoCC.

[5]  Li-Pin Chang,et al.  An adaptive, low-cost wear-leveling algorithm for multichannel solid-state disks , 2013, TECS.

[6]  Yan Solihin,et al.  QoS policies and architecture for cache/memory in CMP platforms , 2007, SIGMETRICS '07.

[7]  Eric Bouillet,et al.  Efficient resource provisioning in compute clouds via VM multiplexing , 2010, ICAC '10.

[8]  Junghee Lee,et al.  Coordinating Garbage Collectionfor Arrays of Solid-State Drives , 2014, IEEE Transactions on Computers.

[9]  Xavier Lorca,et al.  Entropy: a consolidation manager for clusters , 2009, VEE '09.

[10]  Engin Ipek,et al.  Coordinated management of multiple interacting resources in chip multiprocessors: A machine learning approach , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.

[11]  Raghul Gunasekaran,et al.  Understanding I/O workload characteristics of a Peta-scale storage system , 2015, The Journal of Supercomputing.

[12]  Mahmut T. Kandemir,et al.  METE: meeting end-to-end QoS in multicores through system-wide resource management , 2011, SIGMETRICS.

[13]  Vasileios Pappas,et al.  Improving the Scalability of Data Center Networks with Traffic-aware Virtual Machine Placement , 2010, 2010 Proceedings IEEE INFOCOM.

[14]  Daniel A. Menascé,et al.  Two-level iterative queuing modeling of software contention , 2002, Proceedings. 10th IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunications Systems.

[16]  Randy H. Katz,et al.  A view of cloud computing , 2010, CACM.

[17]  David P. Williamson,et al.  Scheduling parallel machines on-line , 1991, [1991] Proceedings 32nd Annual Symposium of Foundations of Computer Science.

[18]  Edward G. Coffman,et al.  Approximation algorithms for bin packing: a survey , 1996 .

[19]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[20]  Robert Latham,et al.  24/7 Characterization of petascale I/O workloads , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.

[21]  Lieven Eeckhout,et al.  Performance prediction based on inherent program similarity , 2006, 2006 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[22]  Chita R. Das,et al.  A Quantitative Analysis of Performance of Shared Service Systems with Multiple Resource Contention , 2010 .

[23]  Andrzej Kochut,et al.  Dynamic Placement of Virtual Machines for Managing SLA Violations , 2007, 2007 10th IFIP/IEEE International Symposium on Integrated Network Management.

[24]  Arkady Kanevsky,et al.  Exertion-based Billing for Cloud Storage Access , 2011, HotCloud.

[25]  Arif Merchant,et al.  An analytic behavior model for disk drives with readahead caches and request reordering , 1998, SIGMETRICS '98/PERFORMANCE '98.

[26]  Peter J. Denning,et al.  The working set model for program behavior , 1968, CACM.

[27]  Chita R. Das,et al.  D-factor: a quantitative model of application slow-down in multi-resource shared systems , 2012, SIGMETRICS '12.

[28]  Gargi Dasgupta,et al.  Server Workload Analysis for Power Minimization using Consolidation , 2009, USENIX Annual Technical Conference.

[29]  Calton Pu,et al.  An Analysis of Performance Interference Effects in Virtual Environments , 2007, 2007 IEEE International Symposium on Performance Analysis of Systems & Software.

[30]  Aman Kansal,et al.  Q-clouds: managing performance interference effects for QoS-aware clouds , 2010, EuroSys '10.

[31]  Chita R. Das,et al.  Modeling and synthesizing task placement constraints in Google compute clusters , 2011, SoCC.

[32]  Wolf-Dietrich Weber,et al.  Power provisioning for a warehouse-sized computer , 2007, ISCA '07.