QoS-Aware Resource Management for Multi-phase Serverless Workflows with Aquatope

Multi-stage serverless applications, i.e., workflows with many computation and I/O stages, are becoming increasingly representative of FaaS platforms. Despite their advantages in terms of fine-grained scalability and modular development, these applications are subject to suboptimal performance, resource inefficiency, and high costs to a larger degree than previous simple serverless functions. We present Aquatope, a QoS-and-uncertainty-aware resource scheduler for end-to-end serverless workflows that takes into account the inherent uncertainty present in FaaS platforms, and im-provesperformancepredictabilityandresourceefficiency.Aquatope usesasetofscalableandvalidatedBayesianmodelstocreatepre-warmedcontainersaheadoffunctioninvocations,andtoallocate appropriateresourcesatfunctiongranularitytomeetacomplex workflow’send-to-endQoS,whileminimizingresourcecost.Across adiversesetofanalyticsandinteractivemulti-stageserverlesswork-loads,Aquatopesignificantlyoutperformspriorsystems,reducing QoSviolationsby5 × , and cost by 34% on average and up to 52% compared to other QoS-meeting methods.

[1]  Rohan Basu Roy,et al.  IceBreaker: warming serverless functions better with heterogeneity , 2022, ASPLOS.

[2]  Christina Delimitrou,et al.  Faster and Cheaper Serverless Computing on Harvested Resources , 2021, SOSP.

[3]  Tirthak Patel,et al.  SATORI: Efficient and Fair Resource Partitioning by Sacrificing Short-Term Benefits for Long-Term Gains* , 2021, 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA).

[4]  Prateek Sharma,et al.  FaasCache: keeping serverless computing alive with greedy-dual caching , 2021, ASPLOS.

[5]  Christina Delimitrou,et al.  Sage: Practical & Scalable ML-Driven Performance Debugging in Microservices , 2020 .

[6]  Christina Delimitrou,et al.  Sinan: ML-based and QoS-aware resource management for cloud microservices , 2021, ASPLOS.

[7]  Marios Kogias,et al.  Benchmarking, analysis, and optimization of serverless function snapshots , 2021, ASPLOS.

[8]  Michael Kishinevsky,et al.  RAMBO: Resource Allocation for Microservices Using Bayesian Optimization , 2021, IEEE Computer Architecture Letters.

[9]  Purushottam Kulkarni,et al.  Xanadu: Mitigating cascading cold starts in serverless function chain deployments , 2020, International Middleware Conference.

[10]  Yubin Xia,et al.  Characterizing serverless platforms with serverlessbench , 2020, SoCC.

[11]  Anshul Gandhi,et al.  ENSURE: Efficient Scheduling and Autonomous Resource Management in Serverless Environments , 2020, 2020 IEEE International Conference on Autonomic Computing and Self-Organizing Systems (ACSOS).

[12]  Christina Delimitrou,et al.  Dagger: Towards Efficient RPCs in Cloud Microservices With Near-Memory Reconfigurable NICs , 2020, IEEE Computer Architecture Letters.

[13]  Yubin Xia,et al.  Catalyzer: Sub-millisecond Startup for Serverless Computing with Initialization-less Booting , 2020, ASPLOS.

[14]  Ricardo Bianchini,et al.  Serverless in the Wild: Characterizing and Optimizing the Serverless Workload at a Large Cloud Provider , 2020, USENIX Annual Technical Conference.

[15]  Tirthak Patel,et al.  CLITE: Efficient and QoS-Aware Co-Location of Multiple Latency-Critical Jobs for Warehouse Scale Computers , 2020, 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[16]  David Wentzlaff,et al.  Architectural Implications of Function-as-a-Service Computing , 2019, MICRO.

[17]  Junyuan Xie,et al.  GluonCV and GluonNLP: Deep Learning in Computer Vision and Natural Language Processing , 2019, J. Mach. Learn. Res..

[18]  Yuan He,et al.  An Open-Source Benchmark Suite for Microservices and Their Hardware-Software Implications for Cloud & Edge Systems , 2019, ASPLOS.

[19]  Yuan He,et al.  Seer: Leveraging Big Data to Navigate the Complexity of Performance Debugging in Cloud Microservices , 2019, ASPLOS.

[20]  Christina Delimitrou,et al.  PARTIES: QoS-Aware Resource Partitioning for Multiple Interactive Services , 2019, ASPLOS.

[21]  Geoffrey M. Voelker,et al.  Sprocket: A Serverless Video Processing Framework , 2018, SoCC.

[22]  Christoforos E. Kozyrakis,et al.  Pocket: Elastic Ephemeral Storage for Serverless Analytics , 2018, OSDI.

[23]  Yiying Zhang,et al.  LegoOS: A Disseminated, Distributed OS for Hardware Resource Disaggregation , 2018, OSDI.

[24]  Andrew Gordon Wilson,et al.  GPyTorch: Blackbox Matrix-Matrix Gaussian Process Inference with GPU Acceleration , 2018, NeurIPS.

[25]  Mengyuan Li,et al.  Peeking Behind the Curtains of Serverless Platforms , 2018, USENIX Annual Technical Conference.

[26]  Sonika Jindal,et al.  EMARS: Efficient Management and Allocation of Resources in Serverless , 2018, 2018 IEEE 11th International Conference on Cloud Computing (CLOUD).

[27]  Nikolay Laptev,et al.  Deep and Confident Prediction for Time Series at Uber , 2017, 2017 IEEE International Conference on Data Mining Workshops (ICDMW).

[28]  Guilherme Ottoni,et al.  Constrained Bayesian Optimization with Noisy Experiments , 2017, Bayesian Analysis.

[29]  Minlan Yu,et al.  CherryPick: Adaptively Unearthing the Best Cloud Configurations for Big Data Analytics , 2017, NSDI.

[30]  Michael Ferdman,et al.  Demystifying cloud benchmarking , 2016, 2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[31]  Christina Delimitrou,et al.  HCloud: Resource-Efficient Provisioning in Shared Cloud Systems , 2016, ASPLOS.

[32]  Zoubin Ghahramani,et al.  A Theoretically Grounded Application of Dropout in Recurrent Neural Networks , 2015, NIPS.

[33]  Christina Delimitrou,et al.  Tarcil: reconciling scheduling speed and quality in large shared clusters , 2015, SoCC.

[34]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[35]  Ryan A. Rossi,et al.  The Network Data Repository with Interactive Graph Analytics and Visualization , 2015, AAAI.

[36]  Matt J. Kusner,et al.  Bayesian Optimization with Inequality Constraints , 2014, ICML.

[37]  Christina Delimitrou,et al.  Quasar: resource-efficient and QoS-aware cluster management , 2014, ASPLOS.

[38]  Christina Delimitrou,et al.  QoS-Aware scheduling in heterogeneous datacenters with paragon , 2013, TOCS.

[39]  Lingjia Tang,et al.  Bubble-flux: precise online QoS management for increased utilization in warehouse scale computers , 2013, ISCA.

[40]  Christina Delimitrou,et al.  Paragon: QoS-aware scheduling for heterogeneous datacenters , 2013, ASPLOS '13.

[41]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[42]  Babak Falsafi,et al.  Clearing the clouds: a study of emerging scale-out workloads on modern hardware , 2012, ASPLOS XVII.

[43]  Nando de Freitas,et al.  A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning , 2010, ArXiv.

[44]  Rob J Hyndman,et al.  Automatic Time Series Forecasting: The forecast Package for R , 2008 .

[45]  Christopher K. I. Williams,et al.  Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning) , 2005 .

[46]  Guoqiang Peter Zhang,et al.  Neural networks for classification: a survey , 2000, IEEE Trans. Syst. Man Cybern. Part C.

[47]  S. Hochreiter,et al.  Long Short-Term Memory , 1997, Neural Computation.

[48]  Daniel R. Jiang,et al.  BoTorch: A Framework for Efficient Monte-Carlo Bayesian Optimization , 2020, NeurIPS.

[49]  Kshitij Doshi,et al.  Agile Cold Starts for Scalable Serverless , 2019, HotCloud.

[50]  Sahil Malik Azure Functions , 2019 .

[51]  Ion Stoica,et al.  Shuffling, Fast and Slow: Scalable Analytics on Serverless Infrastructure , 2019, NSDI.

[52]  J. Yosinski,et al.  Time-series Extreme Event Forecasting with Neural Networks at Uber , 2017 .

[53]  Nando de Freitas,et al.  Taking the Human Out of the Loop: A Review of Bayesian Optimization , 2016, Proceedings of the IEEE.

[54]  Liang Dong,et al.  Starfish: A Self-tuning System for Big Data Analytics , 2011, CIDR.