Rhythm: component-distinguishable workload deployment in datacenters

Cloud service providers improve resource utilization by co-locating latency-critical (LC) workloads with best-effort batch (BE) jobs in datacenters. However, they usually treat an LC workload as a whole when allocating resources to BE jobs and neglect the different features of components of an LC workload. This kind of coarse-grained co-location method leaves a significant room for improvement in resource utilization. Based on the observation of the inconsistent interference tolerance abilities of different LC components, we propose a new abstraction called Servpod, which is a collection of a LC parts that are deployed on the same physical machine together, and show its merits on building a fine-grained co-location framework. The key idea is to differentiate the BE throughput launched with each LC Servpod, i.e., Servpod with high interference tolerance ability can be deployed along with more BE jobs. Based on Servpods, we present Rhythm, a co-location controller that maximizes the resource utilization while guaranteeing LC service's tail latency requirement. It quantifies the interference tolerance ability of each servpod through the analysis of tail-latency contribution. We evaluate Rhythm using LC services in forms of containerized processes and microservices, and find that it can improve the system throughput by 31.7%, CPU utilization by 26.2%, and memory bandwidth utilization by 34% while guaranteeing the SLA (service level agreement).

[1]  David A. Patterson,et al.  A hardware evaluation of cache partitioning to improve utilization and energy-efficiency while preserving responsiveness , 2013, ISCA.

[2]  Alexandru Iosup,et al.  On the Performance Variability of Production Cloud Services , 2011, 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[3]  Jie Liu,et al.  Cuanta: quantifying effects of shared on-chip resource interference for consolidated virtual machines , 2011, SoCC.

[4]  Xi Yang,et al.  Elfen Scheduling: Fine-Grain Principled Borrowing from Latency-Critical Workloads Using Simultaneous Multithreading , 2016, USENIX Annual Technical Conference.

[5]  Xiao Zhang,et al.  CPI2: CPU performance isolation for shared compute clusters , 2013, EuroSys '13.

[6]  Nectarios Koziris,et al.  Improving QoS and Utilisation in modern multi-core servers with Dynamic Cache Partitioning , 2017, COSH/VisorHPC@HiPEAC.

[7]  Sameh Elnikety,et al.  PerfIso: Performance Isolation for Commercial Latency-Sensitive Services , 2018, USENIX Annual Technical Conference.

[8]  Richard Mortier,et al.  Constellation: automated discovery of service and host dependencies in networked systems , 2008 .

[9]  Guofei Jiang,et al.  PInfer: Learning to Infer Concurrent Request Paths from System Kernel Events , 2016, 2016 IEEE International Conference on Autonomic Computing (ICAC).

[10]  Tirthak Patel,et al.  CLITE: Efficient and QoS-Aware Co-Location of Multiple Latency-Critical Jobs for Warehouse Scale Computers , 2020, 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[11]  T. S. Eugene Ng,et al.  The Impact of Virtualization on Network Performance of Amazon EC2 Data Center , 2010, 2010 Proceedings IEEE INFOCOM.

[12]  Donald Beaver,et al.  Dapper, a Large-Scale Distributed Systems Tracing Infrastructure , 2010 .

[13]  Lingjia Tang,et al.  SMiTe: Precise QoS Prediction on Real-System SMT Processors to Improve Utilization in Warehouse Scale Computers , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[14]  Rodrigo Fonseca,et al.  Retro: Targeted Resource Management in Multi-tenant Distributed Systems , 2015, NSDI.

[15]  Christina Delimitrou,et al.  The Architectural Implications of Cloud Microservices , 2018, IEEE Computer Architecture Letters.

[16]  拓海 杉山,et al.  “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[17]  Mattan Erez,et al.  A QoS-aware memory controller for dynamically balancing GPU and CPU bandwidth use in an MPSoC , 2012, DAC Design Automation Conference 2012.

[18]  Daniel Sánchez,et al.  Ubik: efficient cache sharing with strict qos for latency-critical workloads , 2014, ASPLOS.

[19]  Mark Sandler,et al.  Understanding latency variations of black box services , 2013, WWW.

[20]  R. Govindarajan,et al.  Probabilistic Shared Cache Management (PriSM) , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[21]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[22]  Jianfeng Zhan,et al.  Precise, Scalable, and Online Request Tracing for Multitier Services of Black Boxes , 2012, IEEE Transactions on Parallel and Distributed Systems.

[23]  Tipp Moseley,et al.  Measuring interference between live datacenter applications , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[24]  Yuan He,et al.  An Open-Source Benchmark Suite for Microservices and Their Hardware-Software Implications for Cloud & Edge Systems , 2019, ASPLOS.

[25]  Randy H. Katz,et al.  Wrangler: Predictable and Faster Jobs using Fewer Resources , 2014, SoCC.

[26]  Mahmut T. Kandemir,et al.  SHARP control: Controlled shared cache management in chip multiprocessors , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[27]  Woongki Baek,et al.  CoPart: Coordinated Partitioning of Last-Level Cache and Memory Bandwidth for Fairness-Aware Workload Consolidation on Commodity Servers , 2019, EuroSys.

[28]  Mattan Erez,et al.  Dirigent: Enforcing QoS for Latency-Critical Tasks on Shared Multicore Systems , 2016, ASPLOS.

[29]  Bo Li,et al.  iAware: Making Live Migration of Virtual Machines Interference-Aware in the Cloud , 2014, IEEE Transactions on Computers.

[30]  AilamakiAnastasia,et al.  Clearing the clouds , 2012 .

[31]  Calton Pu,et al.  Understanding Performance Interference of I/O Workload in Virtualized Cloud Environments , 2010, 2010 IEEE 3rd International Conference on Cloud Computing.

[32]  Jorge-Arnulfo Quiané-Ruiz,et al.  Runtime measurements in the cloud , 2010, Proc. VLDB Endow..

[33]  T. A. Tyukina,et al.  Law of the Minimum Paradoxes , 2009, Bulletin of mathematical biology.

[34]  Yun Chen,et al.  Supporting Differentiated Services in Computers via Programmable Architecture for Resourcing-on-Demand (PARD) , 2015, ASPLOS.

[35]  Christopher Stewart,et al.  Performance modeling and system management for multi-component online services , 2005, NSDI.

[36]  Christina Delimitrou,et al.  Paragon: QoS-aware scheduling for heterogeneous datacenters , 2013, ASPLOS '13.

[37]  Xu Chen,et al.  Automating Network Application Dependency Discovery: Experiences, Limitations, and New Solutions , 2008, OSDI.

[38]  LiYupeng,et al.  Supporting Differentiated Services in Computers via Programmable Architecture for Resourcing-on-Demand (PARD) , 2015 .

[39]  Vladimir Vlassov,et al.  Stay-Away, protecting sensitive applications from performance interference , 2014, Middleware.

[40]  Karsten Schwan,et al.  E2EProf: Automated End-to-End Performance Management for Enterprise Systems , 2007, 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07).

[41]  Kevin Skadron,et al.  Bubble-up: Increasing utilization in modern warehouse scale computers via sensible co-locations , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[42]  Aman Kansal,et al.  Q-clouds: managing performance interference effects for QoS-aware clouds , 2010, EuroSys '10.

[43]  Wanling Gao,et al.  A Dwarf-based Scalable Big Data Benchmarking Methodology , 2017, ArXiv.

[44]  Ahmed E. Hassan,et al.  Automatic detection of performance deviations in the load testing of Large Scale Systems , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[45]  Saurabh Bagchi,et al.  ICE: An Integrated Configuration Engine for Interference Mitigation in Cloud Services , 2015, 2015 IEEE International Conference on Autonomic Computing.

[46]  Daniel A. Menascé,et al.  TPC-W: A Benchmark for E-Commerce , 2002, IEEE Internet Comput..

[47]  Randy H. Katz,et al.  X-Trace: A Pervasive Network Tracing Framework , 2007, NSDI.

[48]  Antti Ylä-Jääski,et al.  Exploiting Hardware Heterogeneity within the Same Instance Type of Amazon EC2 , 2012, HotCloud.

[49]  Thomas F. Wenisch,et al.  The Mystery Machine: End-to-end Performance Analysis of Large-scale Internet Services , 2014, OSDI.

[50]  Christina Delimitrou,et al.  Quasar: resource-efficient and QoS-aware cluster management , 2014, ASPLOS.

[51]  Christina Delimitrou,et al.  PARTIES: QoS-Aware Resource Partitioning for Multiple Interactive Services , 2019, ASPLOS.

[52]  Richard Mortier,et al.  Using Magpie for Request Extraction and Workload Modelling , 2004, OSDI.

[53]  Rajiv Nishtala,et al.  Twig: Multi-Agent Task Management for Colocated Latency-Critical Cloud Services , 2020, 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[54]  Ravi R. Iyer,et al.  CQoS: a framework for enabling QoS in shared caches of CMP platforms , 2004, ICS '04.

[55]  Marcos K. Aguilera,et al.  Performance debugging for distributed systems of black boxes , 2003, SOSP '03.

[56]  Zhibin Yu,et al.  The Elasticity and Plasticity in Semi-Containerized Co-locating Cloud Workload: a View from Alibaba Trace , 2018, SoCC.

[57]  Prashant J. Shenoy,et al.  Empirical evaluation of latency-sensitive application performance in the cloud , 2010, MMSys '10.

[58]  Kaushik Veeraraghavan,et al.  Canopy: An End-to-End Performance Tracing And Analysis System , 2017, SOSP.

[59]  Bowen Zhou,et al.  Pythia: Improving Datacenter Utilization via Precise Contention Prediction for Multiple Co-located Workloads , 2018, Middleware.

[60]  José Duato,et al.  An empirical model for predicting cross-core performance interference on multicore processors , 2013, PACT 2013.

[61]  David A. Patterson,et al.  Cloud Programming Simplified: A Berkeley View on Serverless Computing , 2019, ArXiv.

[62]  Xing Pu,et al.  Performance Analysis of Network I/O Workloads in Virtualized Data Centers , 2013, IEEE Transactions on Services Computing.

[63]  Lingjia Tang,et al.  GrandSLAm: Guaranteeing SLAs for Jobs in Microservices Execution Frameworks , 2019, EuroSys.

[64]  Christina Delimitrou,et al.  iBench: Quantifying interference for datacenter applications , 2013, 2013 IEEE International Symposium on Workload Characterization (IISWC).

[65]  Lingjia Tang,et al.  Bubble-flux: precise online QoS management for increased utilization in warehouse scale computers , 2013, ISCA.

[66]  Randy H. Katz,et al.  Heterogeneity and dynamicity of clouds at scale: Google trace analysis , 2012, SoCC '12.

[67]  Pengfei Chen,et al.  CauseInfer: Automated End-to-End Performance Diagnosis with Hierarchical Causality Graph in Cloud Environment , 2019, IEEE Transactions on Services Computing.

[68]  Ricardo Bianchini,et al.  DeepDive: Transparently Identifying and Managing Performance Interference in Virtualized Environments , 2013, USENIX Annual Technical Conference.

[69]  Julio César López-Hernández,et al.  Stardust: tracking activity in a distributed storage system , 2006, SIGMETRICS '06/Performance '06.

[70]  Christoforos E. Kozyrakis,et al.  Heracles: Improving resource efficiency at scale , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[71]  Jing Guo,et al.  Who Limits the Resource Efficiency of My Datacenter: An Analysis of Alibaba Datacenter Traces , 2019, 2019 IEEE/ACM 27th International Symposium on Quality of Service (IWQoS).

[72]  Christina Delimitrou,et al.  HCloud: Resource-Efficient Provisioning in Shared Cloud Systems , 2016, ASPLOS.

[73]  Luiz André Barroso,et al.  The tail at scale , 2013, CACM.

[74]  Babak Falsafi,et al.  Clearing the clouds: a study of emerging scale-out workloads on modern hardware , 2012, ASPLOS XVII.

[75]  Xiaobing Feng,et al.  Predicting Cross-Core Performance Interference on Multicore Processors with Regression Analysis , 2016, IEEE Transactions on Parallel and Distributed Systems.