Architectural Implications of Function-as-a-Service Computing

Serverless computing is a rapidly growing cloud application model, popularized by Amazon's Lambda platform. Serverless cloud services provide fine-grained provisioning of resources, which scale automatically with user demand. Function-as-a-Service (FaaS) applications follow this serverless model, with the developer providing their application as a set of functions which are executed in response to a user- or system-generated event. Functions are designed to be short-lived and execute inside containers or virtual machines, introducing a range of system-level overheads. This paper studies the architectural implications of this emerging paradigm. Using the commercial-grade Apache OpenWhisk FaaS platform on real servers, this work investigates and identifies the architectural implications of FaaS serverless computing. The workloads, along with the way that FaaS inherently interleaves short functions from many tenants frustrates many of the locality-preserving architectural structures common in modern processors. In particular, we find that: FaaS containerization brings up to 20x slowdown compared to native execution, cold-start can be over 10x a short function's execution time, branch mispredictions per kilo-instruction are 20x higher for short functions, memory bandwidth increases by 6x due to the invocation pattern, and IPC decreases by as much as 35% due to inter-function interference. We open-source FaaSProfiler, the FaaS testing and profiling platform that we developed for this work.

[1]  Christof Fetzer,et al.  Clemmys: towards secure remote execution in FaaS , 2019, SYSTOR.

[2]  Paarijaat Aditya,et al.  SAND: Towards High-Performance Serverless Computing , 2018, USENIX Annual Technical Conference.

[3]  Ricardo Bianchini,et al.  Resource Central: Understanding and Predicting Workloads for Improved Resource Management in Large Cloud Platforms , 2017, SOSP.

[4]  Babak Falsafi,et al.  Clearing the clouds: a study of emerging scale-out workloads on modern hardware , 2012, ASPLOS XVII.

[5]  Pramod Bhatotia,et al.  Cntr: Lightweight OS Containers , 2018, USENIX Annual Technical Conference.

[6]  Christopher J. Rossbach,et al.  USETL: Unikernels for Serverless Extract Transform and Load Why should you settle for less? , 2019, APSys '19.

[7]  Thomas F. Wenisch,et al.  µTune: Auto-Tuned Threading for OLDI Microservices , 2018, OSDI.

[8]  Christina Delimitrou,et al.  The Architectural Implications of Cloud Microservices , 2018, IEEE Computer Architecture Letters.

[9]  David A. Patterson,et al.  Attack of the killer microseconds , 2017, Commun. ACM.

[10]  Gu-Yeon Wei,et al.  Profiling a warehouse-scale computer , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[11]  Randy H. Katz,et al.  Heterogeneity and dynamicity of clouds at scale: Google trace analysis , 2012, SoCC '12.

[12]  Boris Grot,et al.  Farewell My Shared LLC! A Case for Private Die-Stacked DRAM Caches for Servers , 2018, 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[13]  Yuan He,et al.  An Open-Source Benchmark Suite for Microservices and Their Hardware-Software Implications for Cloud & Edge Systems , 2019, ASPLOS.

[14]  David M. Brooks,et al.  Applied Machine Learning at Facebook: A Datacenter Infrastructure Perspective , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[15]  Maria Kihl,et al.  Performance Overhead Comparison between Hypervisor and Container Based Virtualization , 2017, 2017 IEEE 31st International Conference on Advanced Information Networking and Applications (AINA).

[16]  Thomas F. Wenisch,et al.  μ Suite: A Benchmark Suite for Microservices , 2018, 2018 IEEE International Symposium on Workload Characterization (IISWC).

[17]  Xin Tong,et al.  QTrace: An interface for customizable full system instrumentation , 2013, 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[18]  Anirudh Sivaraman,et al.  Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads , 2017, NSDI.

[19]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[20]  Chita R. Das,et al.  OSCAR: Orchestrating STT-RAM cache traffic for heterogeneous CPU-GPU architectures , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[21]  Dirk Merkel,et al.  Docker: lightweight Linux containers for consistent development and deployment , 2014 .

[22]  S. McFarling Combining Branch Predictors , 1993 .

[23]  Qiang He,et al.  StressCloud: A Tool for Analysing Performance and Energy Consumption of Cloud Applications , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[24]  Dan Williams,et al.  Will Serverless End the Dominance of Linux in the Cloud? , 2017, HotOS.

[25]  Thomas F. Wenisch,et al.  The Mystery Machine: End-to-end Performance Analysis of Large-scale Internet Services , 2014, OSDI.

[26]  Christina Delimitrou,et al.  PARTIES: QoS-Aware Resource Partitioning for Multiple Interactive Services , 2019, ASPLOS.

[27]  Babak Falsafi,et al.  Scale-out processors , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[28]  Chunjie Luo,et al.  Characterizing data analysis workloads in data centers , 2013, 2013 IEEE International Symposium on Workload Characterization (IISWC).

[29]  David Wentzlaff,et al.  Power and Energy Characterization of an Open Source 25-Core Manycore Processor , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[30]  Geoffrey C. Fox,et al.  Evaluation of Production Serverless Computing Environments , 2018, 2018 IEEE 11th International Conference on Cloud Computing (CLOUD).

[31]  David Wentzlaff,et al.  Availability Knob: Flexible User-Defined Availability in the Cloud , 2016, SoCC.

[32]  Mengyuan Li,et al.  Peeking Behind the Curtains of Serverless Platforms , 2018, USENIX Annual Technical Conference.

[33]  Hai Jin,et al.  A Performance Study of Containers in Cloud Environment , 2016, APSCC.

[34]  Babak Falsafi,et al.  Quantifying the Mismatch between Emerging Scale-Out Applications and Modern Processors , 2012, TOCS.

[35]  Dilma Da Silva,et al.  Exploring Serverless Computing for Neural Network Training , 2018, 2018 IEEE 11th International Conference on Cloud Computing (CLOUD).

[36]  Wenzhi Cui,et al.  Simulation and Analysis Engine for Scale-Out Workloads , 2016, ICS.

[37]  Jinchun Kim,et al.  Kill the Program Counter: Reconstructing Program Behavior in the Processor Cache Hierarchy , 2017, ASPLOS.

[38]  David Wentzlaff,et al.  Piton: A Manycore Processor for Multitenant Clouds , 2017, IEEE Micro.

[39]  Geoffrey M. Voelker,et al.  Sprocket: A Serverless Video Processing Framework , 2018, SoCC.

[40]  MudgeTrevor,et al.  Understanding and Designing New Server Architectures for Emerging Warehouse-Computing Environments , 2008 .

[41]  Erik Elmroth,et al.  Incentivizing self-capping to increase cloud utilization , 2017, SoCC.

[42]  Maciej Malawski,et al.  Performance evaluation of heterogeneous cloud functions , 2018, Concurr. Comput. Pract. Exp..

[43]  Yuanyuan Zhou,et al.  Early Detection of Configuration Errors to Reduce Failure Damage , 2016, USENIX Annual Technical Conference.

[44]  Will Reese,et al.  Nginx: the high-performance web server and reverse proxy , 2008 .

[45]  Alexandru Iosup,et al.  Serverless is More: From PaaS to Present Cloud Computing , 2018, IEEE Internet Computing.

[46]  Kiyoung Choi,et al.  A scalable processing-in-memory accelerator for parallel graph processing , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[47]  Rüdiger Kapitza,et al.  Trust more, serverless , 2019, SYSTOR.

[48]  Abhinav Srivastava,et al.  CloudSight: A Tenant-Oriented Transparency Framework for Cross-Layer Cloud Troubleshooting , 2017, 2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID).

[49]  Emiliano Casalicchio,et al.  Measuring Docker Performance: What a Mess!!! , 2017, ICPE Companion.

[50]  Perry Cheng,et al.  The serverless trilemma: function composition for serverless computing , 2017, Onward!.

[51]  Adrian Moga,et al.  High performing cache hierarchies for server workloads: Relaxing inclusion to capture the latency benefits of exclusive caches , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[52]  Md. Abu Naser Bikas,et al.  Search-Based Stress Testing the Elastic Resource Provisioning for Cloud-Based Applications , 2018, SSBSE.

[53]  Paul R. Brenner,et al.  Serverless Computing: Design, Implementation, and Performance , 2017, 2017 IEEE 37th International Conference on Distributed Computing Systems Workshops (ICDCSW).

[54]  Florian Schmidt,et al.  My VM is Lighter (and Safer) than your Container , 2017, SOSP.

[55]  Thomas F. Wenisch,et al.  The Queuing-First Approach for Tail Management of Interactive Services , 2019, IEEE Micro.

[56]  Andrea C. Arpaci-Dusseau,et al.  The True Cost of Containing: A gVisor Case Study , 2019, HotCloud.

[57]  Nicholas Nethercote,et al.  Valgrind: a framework for heavyweight dynamic binary instrumentation , 2007, PLDI '07.

[58]  Hazer Inaltekin,et al.  Characterizing Task Completion Latencies in Fog Computing , 2018, ArXiv.

[59]  Zhibin Yu,et al.  The Elasticity and Plasticity in Semi-Containerized Co-locating Cloud Workload: a View from Alibaba Trace , 2018, SoCC.

[60]  Christina Delimitrou,et al.  X-Containers: Breaking Down Barriers to Improve Performance and Isolation of Cloud-Native Containers , 2019, ASPLOS.

[61]  Kshitij Doshi,et al.  Agile Cold Starts for Scalable Serverless , 2019, HotCloud.

[62]  Ramakrishnan Rajamony,et al.  An updated performance comparison of virtual machines and Linux containers , 2015, 2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[63]  David Wentzlaff,et al.  OpenPiton: An Open Source Manycore Research Framework , 2016, ASPLOS.

[64]  Ravi Iyer,et al.  Cache QoS: From concept to reality in the Intel® Xeon® processor E5-2600 v3 product family , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[65]  Yuan He,et al.  Seer: Leveraging Big Data to Navigate the Complexity of Performance Debugging in Cloud Microservices , 2019, ASPLOS.