CLITE: Efficient and QoS-Aware Co-Location of Multiple Latency-Critical Jobs for Warehouse Scale Computers

Large-scale data centers run latency-critical jobs with quality-of-service (QoS) requirements, and throughput-oriented background jobs, which need to achieve high perfor-mance. Previous works have proposed methods which cannot co-locate multiple latency-critical jobs with multiple back-grounds jobs while: (1) meeting the QoS requirements of all latency-critical jobs, and (2) maximizing the performance of the background jobs. This paper proposes CLITE, a Bayesian Optimization-based, multi-resource partitioning technique which achieves these goals. CLITE is publicly available at https://github.com/GoodwillComputingLab/CLITE.

[1]  Thomas F. Wenisch,et al.  The Queuing-First Approach for Tail Management of Interactive Services , 2019, IEEE Micro.

[2]  Woongki Baek,et al.  CoPart: Coordinated Partitioning of Last-Level Cache and Memory Bandwidth for Fairness-Aware Workload Consolidation on Commodity Servers , 2019, EuroSys.

[3]  Mattan Erez,et al.  Dirigent: Enforcing QoS for Latency-Critical Tasks on Shared Multicore Systems , 2016, ASPLOS.

[4]  Yang Li,et al.  dCat: dynamic cache management for efficient, performance-sensitive infrastructure-as-a-service , 2018, EuroSys.

[5]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[6]  Sameh Elnikety,et al.  PerfIso: Performance Isolation for Commercial Latency-Sensitive Services , 2018, USENIX Annual Technical Conference.

[7]  Lingjia Tang,et al.  GrandSLAm: Guaranteeing SLAs for Jobs in Microservices Execution Frameworks , 2019, EuroSys.

[8]  Brad Fitzpatrick,et al.  Distributed caching with memcached , 2004 .

[9]  Xiaodong Wang,et al.  SWAP: Effective Fine-Grain Management of Shared Last-Level Caches with Minimum Hardware Support , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[10]  No License,et al.  Intel ® 64 and IA-32 Architectures Software Developer ’ s Manual Volume 3 A : System Programming Guide , Part 1 , 2006 .

[11]  Thomas F. Wenisch,et al.  SoftSKU: Optimizing Server Architectures for Microservice Diversity @Scale , 2019, 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA).

[12]  Kirthevasan Kandasamy,et al.  Tuning Hyperparameters without Grad Students: Scalable and Robust Bayesian Optimisation with Dragonfly , 2019, J. Mach. Learn. Res..

[13]  Gang Wei,et al.  Generalized non-convex non-smooth sparse and low rank minimization using proximal average , 2016, Neurocomputing.

[14]  Leslie Pack Kaelbling,et al.  Bayesian Optimization with Exponential Convergence , 2015, NIPS.

[15]  Wei Zhou,et al.  An extended fine-grained conflict detection method for shared-state scheduling in large scale cluster , 2016, ICIIP '16.

[16]  Christina Delimitrou,et al.  The Architectural Implications of Cloud Microservices , 2018, IEEE Computer Architecture Letters.

[17]  Yuan He,et al.  An Open-Source Benchmark Suite for Microservices and Their Hardware-Software Implications for Cloud & Edge Systems , 2019, ASPLOS.

[18]  Mahmut T. Kandemir,et al.  Phoenix: A Constraint-Aware Scheduler for Heterogeneous Datacenters , 2017, 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS).

[19]  Daniel Sánchez,et al.  Tailbench: a benchmark suite and evaluation methodology for latency-critical applications , 2016, 2016 IEEE International Symposium on Workload Characterization (IISWC).

[20]  Yingwei Luo,et al.  DCAPS: dynamic cache allocation with partial sharing , 2018, EuroSys.

[21]  Christian Bienia,et al.  PARSEC 2.0: A New Benchmark Suite for Chip-Multiprocessors , 2009 .

[22]  Christina Delimitrou,et al.  Pliant: Leveraging Approximation to Improve Datacenter Resource Efficiency , 2018, 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[23]  Anshul Gandhi,et al.  Scavenger: A Black-Box Batch Workload Resource Manager for Improving Utilization in Cloud Environments , 2019, SoCC.

[24]  Christoforos E. Kozyrakis,et al.  Heracles: Improving resource efficiency at scale , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[25]  Boris Grot,et al.  Stretch: Balancing QoS and Throughput for Colocated Server Workloads on SMT Cores , 2019, 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[26]  Wei Wang,et al.  ReQoS: reactive static/dynamic compilation for QoS in warehouse scale computers , 2013, ASPLOS '13.

[27]  Sebastian Ruder,et al.  An overview of gradient descent optimization algorithms , 2016, Vestnik komp'iuternykh i informatsionnykh tekhnologii.

[28]  Chita R. Das,et al.  D-factor: a quantitative model of application slow-down in multi-resource shared systems , 2012, SIGMETRICS '12.

[29]  Lingjia Tang,et al.  Bubble-flux: precise online QoS management for increased utilization in warehouse scale computers , 2013, ISCA.

[30]  Fabien Hermenier,et al.  Multi-objective job placement in clusters , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.

[31]  Christoforos E. Kozyrakis,et al.  Vantage: Scalable and efficient fine-grain cache partitioning , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).

[32]  Daniel Sánchez,et al.  Rubik: Fast analytical power management for latency-critical systems , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[33]  Minyi Guo,et al.  Avalon: towards QoS awareness and improved utilization through multi-resource management in datacenters , 2019, ICS.

[34]  Xiao Zhang,et al.  CPI2: CPU performance isolation for shared compute clusters , 2013, EuroSys '13.

[35]  Cheng Li,et al.  High Dimensional Bayesian Optimization using Dropout , 2018, IJCAI.

[36]  Lizy Kurian John,et al.  Predictive coordination of multiple on-chip resources for chip multiprocessors , 2011, ICS '11.

[37]  Christine A. Shoemaker,et al.  Flicker: a dynamically adaptive architecture for power limited multicore systems , 2013, ISCA.

[38]  Tim Menzies,et al.  Transfer Learning with Bellwethers to find Good Configurations , 2018, ArXiv.

[39]  Mahmut T. Kandemir,et al.  A case for integrated processor-cache partitioning in chip multiprocessors , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[40]  Lingjia Tang,et al.  SMiTe: Precise QoS Prediction on Real-System SMT Processors to Improve Utilization in Warehouse Scale Computers , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[41]  Christina Delimitrou,et al.  Quasar: resource-efficient and QoS-aware cluster management , 2014, ASPLOS.

[42]  Christina Delimitrou,et al.  PARTIES: QoS-Aware Resource Partitioning for Multiple Interactive Services , 2019, ASPLOS.

[43]  Tim Menzies,et al.  Scout: An Experienced Guide to Find the Best Cloud Configuration , 2018, ArXiv.

[44]  Xiaosong Ma,et al.  KPart: A Hybrid Cache Partitioning-Sharing Technique for Commodity Multicores , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[45]  Andrew V. Goldberg,et al.  Quincy: fair scheduling for distributed computing clusters , 2009, SOSP '09.

[46]  Guilherme Ottoni,et al.  Constrained Bayesian Optimization with Noisy Experiments , 2017, Bayesian Analysis.

[47]  Lingjia Tang,et al.  Compiling for niceness: mitigating contention for QoS in warehouse scale computers , 2012, CGO '12.

[48]  Yuan He,et al.  Seer: Leveraging Big Data to Navigate the Complexity of Performance Debugging in Cloud Microservices , 2019, ASPLOS.

[49]  Woongki Baek,et al.  Hypart: a hybrid technique for practical memory bandwidth partitioning on commodity servers , 2018, PACT.

[50]  Qi Luo,et al.  Automating performance bottleneck detection using search-based application profiling , 2015, ISSTA.

[51]  Tim Menzies,et al.  Micky: A Cheaper Alternative for Selecting Cloud Instances , 2018, 2018 IEEE 11th International Conference on Cloud Computing (CLOUD).

[52]  Nando de Freitas,et al.  Taking the Human Out of the Loop: A Review of Bayesian Optimization , 2016, Proceedings of the IEEE.

[53]  Kevin Skadron,et al.  Bubble-up: Increasing utilization in modern warehouse scale computers via sensible co-locations , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[54]  Christina Delimitrou,et al.  QoS-Aware scheduling in heterogeneous datacenters with paragon , 2013, TOCS.

[55]  Praneeth Netrapalli,et al.  Stochastic Gradient Descent and Its Variants in Machine Learning , 2019, Journal of the Indian Institute of Science.

[56]  Ulf Leser,et al.  Predictive performance modeling for distributed batch processing using black box monitoring and machine learning , 2018, Inf. Syst..

[57]  Benjamin C. Lee,et al.  Hound , 2018, PERV.

[58]  Nobuyuki Shimizu,et al.  Bayesian Optimization of HPC Systems for Energy Efficiency , 2018, ISC.

[59]  Minlan Yu,et al.  CherryPick: Adaptively Unearthing the Best Cloud Configurations for Big Data Analytics , 2017, NSDI.

[60]  Alexandre Scotto Di Perrotolo A Theoretical Framework for Bayesian Optimization Convergence , 2018 .

[61]  M. Martonosi,et al.  A Comparison of Capacity Management Schemes for Shared CMP Caches , 2008 .

[62]  Mazin S. Yousif,et al.  Microservices , 2016, IEEE Cloud Comput..

[63]  Benjamin C. Lee,et al.  Cooper: Task Colocation with Cooperative Games , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[64]  Minyi Guo,et al.  Laius: Towards latency awareness and improved utilization of spatial multitasking accelerators in datacenters , 2019, ICS.

[65]  Farhad Azadivar,et al.  Simulation optimization methodologies , 1999, WSC '99.

[66]  Christina Delimitrou,et al.  Tarcil: reconciling scheduling speed and quality in large shared clusters , 2015, SoCC.

[67]  Daniel Sánchez,et al.  Ubik: efficient cache sharing with strict qos for latency-critical workloads , 2014, ASPLOS.

[68]  Robert L. Mason,et al.  Fractional factorial design , 2009 .