论文信息 - PARTIES: QoS-Aware Resource Partitioning for Multiple Interactive Services

PARTIES: QoS-Aware Resource Partitioning for Multiple Interactive Services

Multi-tenancy in modern datacenters is currently limited to a single latency-critical, interactive service, running alongside one or more low-priority, best-effort jobs. This limits the efficiency gains from multi-tenancy, especially as an increasing number of cloud applications are shifting from batch jobs to services with strict latency requirements. We present PARTIES, a QoS-aware resource manager that enables an arbitrary number of interactive, latency-critical services to share a physical node without QoS violations. PARTIES leverages a set of hardware and software resource partitioning mechanisms to adjust allocations dynamically at runtime, in a way that meets the QoS requirements of each co-scheduled workload, and maximizes throughput for the machine. We evaluate PARTIES on state-of-the-art server platforms across a set of diverse interactive services. Our results show that PARTIES improves throughput under QoS by 61% on average, compared to existing resource managers, and that the rate of improvement increases with the number of co-scheduled applications per physical host.

[1] Christoforos E. Kozyrakis,et al. Reconciling high server utilization and sub-millisecond quality-of-service , 2014, EuroSys '14.

[2] Adam Wierman,et al. Open Versus Closed: A Cautionary Tale , 2006, NSDI.

[3] Christoforos E. Kozyrakis,et al. Vantage: Scalable and efficient fine-grain cache partitioning , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).

[4] Lada A. Adamic,et al. Zipf's law and the Internet , 2002, Glottometrics.

[5] Christina Delimitrou,et al. Bolt: I Know What You Did Last Summer... In The Cloud , 2017, ASPLOS.

[6] Luiz André Barroso,et al. The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines , 2009, The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines.

[7] Philipp Koehn,et al. Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[8] Yuan He,et al. Seer: Leveraging Big Data to Navigate the Complexity of Performance Debugging in Cloud Microservices , 2019, ASPLOS.

[9] James Charles,et al. Evaluation of the Intel® Core™ i7 Turbo Boost feature , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).

[10] Adam Silberstein,et al. Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.

[11] Daniel Sánchez,et al. Rubik: Fast analytical power management for latency-critical systems , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[12] Xiao Zhang,et al. CPI2: CPU performance isolation for shared compute clusters , 2013, EuroSys '13.

[13] Willy Zwaenepoel,et al. X-Stream: edge-centric graph processing using streaming partitions , 2013, SOSP.

[14] Xiaodong Wang,et al. XChange: A market-based approach to scalable dynamic multi-resource allocation in multicore architectures , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[15] Lingjia Tang,et al. SMiTe: Precise QoS Prediction on Real-System SMT Processors to Improve Utilization in Warehouse Scale Computers , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[16] Pradeep Dubey,et al. Architecting to achieve a billion requests per second throughput on a single key-value store server platform , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[17] Engin Ipek,et al. Coordinated management of multiple interacting resources in chip multiprocessors: A machine learning approach , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.

[18] Michael J. Franklin,et al. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[19] Daniel Sánchez,et al. Tailbench: a benchmark suite and evaluation methodology for latency-critical applications , 2016, 2016 IEEE International Symposium on Workload Characterization (IISWC).

[20] Lingjia Tang,et al. Whare-map: heterogeneity in "homogeneous" warehouse-scale computers , 2013, ISCA.

[21] Christina Delimitrou,et al. Paragon: QoS-aware scheduling for heterogeneous datacenters , 2013, ASPLOS '13.

[22] Christina Delimitrou,et al. The Architectural Implications of Cloud Microservices , 2018, IEEE Computer Architecture Letters.

[23] Yuan He,et al. An Open-Source Benchmark Suite for Microservices and Their Hardware-Software Implications for Cloud & Edge Systems , 2019, ASPLOS.

[24] Christoforos E. Kozyrakis,et al. Heracles: Improving resource efficiency at scale , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[25] Brad Fitzpatrick,et al. Distributed caching with memcached , 2004 .

[26] Xiaodong Wang,et al. SWAP: Effective Fine-Grain Management of Shared Last-Level Caches with Minimum Hardware Support , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[27] Lingjia Tang,et al. Treadmill: Attributing the Source of Tail Latency through Precise Load Testing and Statistical Inference , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[28] Paul Lamere,et al. Sphinx-4: a flexible open source framework for speech recognition , 2004 .

[29] Scott Shenker,et al. Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling , 2010, EuroSys '10.

[30] Song Jiang,et al. Workload analysis of a large-scale key-value store , 2012, SIGMETRICS '12.

[31] Christina Delimitrou,et al. Quasar: resource-efficient and QoS-aware cluster management , 2014, ASPLOS.

[32] M. Martonosi,et al. A Comparison of Capacity Management Schemes for Shared CMP Caches , 2008 .

[33] Jialin Li,et al. Tales of the Tail: Hardware, OS, and Application-level Sources of Tail Latency , 2014, SoCC.

[34] Christina Delimitrou,et al. Tarcil: reconciling scheduling speed and quality in large shared clusters , 2015, SoCC.

[35] Christina Delimitrou,et al. HCloud: Resource-Efficient Provisioning in Shared Cloud Systems , 2016, ASPLOS.

[36] Christina Delimitrou,et al. Workload characterization of interactive cloud services on big and small server platforms , 2017, 2017 IEEE International Symposium on Workload Characterization (IISWC).

[37] Abhishek Verma,et al. Large-scale cluster management at Google with Borg , 2015, EuroSys.

[38] Kevin Skadron,et al. Bubble-up: Increasing utilization in modern warehouse scale computers via sensible co-locations , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[39] Michael Abd-El-Malek,et al. Omega: flexible, scalable schedulers for large compute clusters , 2013, EuroSys '13.

[40] Christina Delimitrou,et al. QoS-Aware scheduling in heterogeneous datacenters with paragon , 2013, TOCS.

[41] Daniel Sánchez,et al. Ubik: efficient cache sharing with strict qos for latency-critical workloads , 2014, ASPLOS.

[42] Michael D. Smith,et al. Improving Performance Isolation on Chip Multiprocessors via an Operating System Scheduler , 2007, 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007).

[43] Christina Delimitrou,et al. iBench: Quantifying interference for datacenter applications , 2013, 2013 IEEE International Symposium on Workload Characterization (IISWC).

[44] Chris Callison-Burch,et al. Open Source Toolkit for Statistical Machine Translation: Factored Translation Models and Lattice Decoding , 2006 .

[45] Lingjia Tang,et al. Bubble-flux: precise online QoS management for increased utilization in warehouse scale computers , 2013, ISCA.

[46] Azer Bestavros,et al. Colocation as a Service: Strategic and Operational Services for Cloud Colocation , 2010, 2010 Ninth IEEE International Symposium on Network Computing and Applications.

[47] Lakshmish Ramaswamy,et al. Cache Clouds: Cooperative Caching of Dynamic Documents in Edge Networks , 2005, 25th IEEE International Conference on Distributed Computing Systems (ICDCS'05).

[48] Fabien Hermenier,et al. Multi-objective job placement in clusters , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.

[49] Babak Falsafi,et al. Clearing the clouds: a study of emerging scale-out workloads on modern hardware , 2012, ASPLOS XVII.

[50] Andrew V. Goldberg,et al. Quincy: fair scheduling for distributed computing clusters , 2009, SOSP '09.

[51] Christoforos E. Kozyrakis,et al. Towards energy proportionality for large-scale latency-critical workloads , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).