Catalyzer: Sub-millisecond Startup for Serverless Computing with Initialization-less Booting

Serverless computing promises cost-efficiency and elasticity for high-productive software development. To achieve this, the serverless sandbox system must address two challenges: strong isolation between function instances, and low startup latency to ensure user experience. While strong isolation can be provided by virtualization-based sandboxes, the initialization of sandbox and application causes non-negligible startup overhead. Conventional sandbox systems fall short in low-latency startup due to their application-agnostic nature: they can only reduce the latency of sandbox initialization through hypervisor and guest kernel customization, which is inadequate and does not mitigate the majority of startup overhead. This paper proposes Catalyzer, a serverless sandbox system design providing both strong isolation and extremely fast function startup. Instead of booting from scratch, Catalyzer restores a virtualization-based function instance from a well-formed checkpoint image and thereby skips the initialization on the critical path (init-less). Catalyzer boosts the restore performance by on-demand recovering both user-level memory state and system state. We also propose a new OS primitive, sfork (sandbox fork), to further reduce the startup latency by directly reusing the state of a running sandbox instance. Fundamentally, Catalyzer removes the initialization cost by reusing state, which enables general optimizations for diverse serverless functions. The evaluation shows that Catalyzer reduces startup latency by orders of magnitude, achieves < 1ms latency in the best case, and significantly reduces the end-to-end latency for real-world workloads. Catalyzer has been adopted by Ant Financial, and we also present lessons learned from industrial development.

[1]  Mengyuan Li,et al.  Peeking Behind the Curtains of Serverless Platforms , 2018, USENIX Annual Technical Conference.

[2]  Stephen Ennis,et al.  Cloud Event Programming Paradigms: Applications and Analysis , 2016, 2016 IEEE 9th International Conference on Cloud Computing (CLOUD).

[3]  Andrea C. Arpaci-Dusseau,et al.  SOCK: Rapid Task Provisioning with Serverless-Optimized Containers , 2018, USENIX Annual Technical Conference.

[4]  Michael Vrable,et al.  Scalability, fidelity, and containment in the potemkin virtual honeyfarm , 2005, SOSP '05.

[5]  Christoforos E. Kozyrakis,et al.  Pocket: Elastic Ephemeral Storage for Serverless Analytics , 2018, OSDI.

[6]  Florian Schmidt,et al.  My VM is Lighter (and Safer) than your Container , 2017, SOSP.

[7]  Andrea C. Arpaci-Dusseau,et al.  Pipsqueak: Lean Lambdas with Large Libraries , 2017, 2017 IEEE 37th International Conference on Distributed Computing Systems Workshops (ICDCSW).

[8]  Arvind Krishnamurthy,et al.  E3: Energy-Efficient Microservices on SmartNIC-Accelerated Servers , 2019, USENIX Annual Technical Conference.

[9]  Don Marti,et al.  OSv - Optimizing the Operating System for Virtual Machines , 2014, USENIX Annual Technical Conference.

[10]  Changwoo Min,et al.  Scaling Guest OS Critical Sections with eCS , 2018, USENIX Annual Technical Conference.

[11]  Fabrice Bellard,et al.  QEMU, a Fast and Portable Dynamic Translator , 2005, USENIX Annual Technical Conference, FREENIX Track.

[12]  Ding Yuan,et al.  Don't Get Caught in the Cold, Warm-up Your JVM: Understand and Eliminate JVM Warm-up Overhead in Data-Parallel Systems , 2016, OSDI.

[13]  Abel Gordon,et al.  Paravirtual Remote I/O , 2016, ASPLOS.

[14]  Jon Crowcroft,et al.  Unikernels: library operating systems for the cloud , 2013, ASPLOS '13.

[15]  Alex Landau,et al.  ELI: bare-metal performance for I/O virtualization , 2012, ASPLOS XVII.

[16]  Paarijaat Aditya,et al.  SAND: Towards High-Performance Serverless Computing , 2018, USENIX Annual Technical Conference.

[17]  Peng Wu,et al.  Replayable Execution Optimized for Page Sharing for a Managed Runtime Environment , 2019, EuroSys.

[18]  Jon Crowcroft,et al.  Jitsu: Just-In-Time Summoning of Unikernels , 2015, NSDI.

[19]  Christoforos E. Kozyrakis,et al.  Usenix Association 10th Usenix Symposium on Operating Systems Design and Implementation (osdi '12) 335 Dune: Safe User-level Access to Privileged Cpu Features , 2022 .

[20]  Eyal de Lara,et al.  SnowFlock: rapid virtual machine cloning for cloud computing , 2009, EuroSys '09.

[21]  Christina Delimitrou,et al.  X-Containers: Breaking Down Barriers to Improve Performance and Isolation of Cloud-Native Containers , 2019, ASPLOS.

[22]  Wenke Lee,et al.  How to Make ASLR Win the Clone Wars: Runtime Re-Randomization , 2016, NDSS.

[23]  David G. Andersen,et al.  Putting the "Micro" Back in Microservice , 2018, USENIX Annual Technical Conference.

[24]  Liang Zhang,et al.  Picocenter: supporting long-lived, mostly-idle applications in cloud environments , 2016, EuroSys.

[25]  Wenke Lee,et al.  ASLR-Guard: Stopping Address Space Leakage for Code Reuse Attacks , 2015, CCS.

[26]  Christoforos E. Kozyrakis,et al.  Understanding Ephemeral Storage for Serverless Analytics , 2018, USENIX Annual Technical Conference.

[27]  Yuan He,et al.  An Open-Source Benchmark Suite for Microservices and Their Hardware-Software Implications for Cloud & Edge Systems , 2019, ASPLOS.

[28]  Nadav Amit,et al.  The Design and Implementation of Hyperupcalls , 2018, USENIX Annual Technical Conference.