On-demand-fork: a microsecond fork for memory-intensive and latency-sensitive applications

Fork has long been the process creation system call for Unix. At its inception, fork was hailed as an efficient system call due to its use of copy-on-write on memory shared between parent and child processes. However, application memory demand has increased drastically since the early days and the cost incurred by fork to simply set up virtual memory (e.g., copy page tables) is now a concern, even for applications that only require hundreds of MBs of memory. In practice, fork performance already holds back system efficiency and latency across a range of uses cases that fork large processes, such as fault-tolerant systems, serverless frameworks, and testing frameworks. This paper proposes On-demand-fork, a fast implementation of the fork system call specifically designed for applications with large memory footprints. On-demand-fork relies on the observation that copy-on-write can be generalized to page tables, even on commodity hardware. On-demand-fork executes faster than the traditional fork implementation by additionally sharing page tables between parent and child at fork time and selectively copying page tables in small chunks, on-demand, when handling page faults. On-demand-fork is a drop-in replacement for fork that requires no changes to applications or hardware. We evaluated On-demand-fork on a range of micro-benchmarks and real-world workloads. On-demand-fork significantly reduces the fork invocation time and has improved scalability. For processes with 1 GB of allocated memory, On-demand-fork has a 65× performance advantage over Fork. We also evaluated On-demand-fork on testing, fuzzing, and snapshotting workloads of well-known applications, obtaining execution throughput improvements between 59% and 226% and up to 99% invocation latency reduction.

[1]  Xinyu Li,et al.  Thinking about A New Mechanism for Huge Page Management , 2019, APSys '19.

[2]  Cheng Li,et al.  Finding complex concurrency bugs in large multi-threaded applications , 2011, EuroSys '11.

[3]  M. Frans Kaashoek,et al.  RadixVM: scalable address spaces for multithreaded applications , 2013, EuroSys '13.

[4]  Jonathan M. Smith,et al.  Effects of Copy-on-Write Memory Management on the Response Time of UNIX Fork Operations , 1988, Comput. Syst..

[5]  Gang Wu,et al.  Consistent Snapshot Algorithms for In-Memory Database Systems: Experiments and Analysis , 2018, 2018 IEEE 34th International Conference on Data Engineering (ICDE).

[6]  A. Retrospective,et al.  The UNIX Time-sharing System , 1977 .

[7]  Eyal de Lara,et al.  SnowFlock: rapid virtual machine cloning for cloud computing , 2009, EuroSys '09.

[8]  Mathias Payer,et al.  FuZZan: Efficient Sanitizer Metadata Design for Fuzzing , 2020, USENIX Annual Technical Conference.

[9]  Koushik Sen,et al.  FairFuzz: A Targeted Mutation Strategy for Increasing Greybox Fuzz Testing Coverage , 2018, 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[10]  Osman S. Unsal,et al.  Performance analysis of the memory management unit under scale-out workloads , 2014, 2014 IEEE International Symposium on Workload Characterization (IISWC).

[11]  Wen Xu,et al.  Designing New Operating Primitives to Improve Fuzzing Performance , 2017, CCS.

[12]  Ricardo Bianchini,et al.  Managing Tail Latency in Datacenter-Scale File Systems Under Production Constraints , 2019, EuroSys.

[13]  R. Sternberg,et al.  The fork in the road , 2017, Behavioral and Brain Sciences.

[14]  K. Gopinath,et al.  Making Huge Pages Actually Useful , 2018, ASPLOS.

[15]  David Brumley,et al.  Optimizing Seed Selection for Fuzzing , 2014, USENIX Security Symposium.

[16]  Rodrigo Rodrigues,et al.  SKI: Exposing Kernel Concurrency Bugs through Systematic Schedule Exploration , 2014, OSDI.

[17]  Silas Boyd-Wickizer,et al.  OpLog: a library for scaling update-heavy data structures , 2014 .

[18]  Peng Wu,et al.  Replayable Execution Optimized for Page Sharing for a Managed Runtime Environment , 2019, EuroSys.

[19]  Yubin Xia,et al.  Catalyzer: Sub-millisecond Startup for Serverless Computing with Initialization-less Booting , 2020, ASPLOS.

[20]  Insik Shin,et al.  HFL: Hybrid Fuzzing on the Linux Kernel , 2020, NDSS.

[21]  Youngjin Kwon,et al.  Coordinated and Efficient Huge Page Management with Ingens , 2016, OSDI.

[22]  No License,et al.  Intel ® 64 and IA-32 Architectures Software Developer ’ s Manual Volume 3 A : System Programming Guide , Part 1 , 2006 .

[23]  Sebastian Schinzel,et al.  kAFL: Hardware-Assisted Feedback Fuzzing for OS Kernels , 2017, USENIX Security Symposium.

[24]  Michael James Colavita HyperFork: Improving Serverless Latency and Throughput Through Virtual Machine Flash-Cloning , 2020 .

[25]  Trent Jaeger,et al.  Lightweight kernel isolation with virtualization and VM functions , 2020, VEE.

[26]  Yang Zhang,et al.  Corey: An Operating System for Many Cores , 2008, OSDI.

[27]  Alan L. Cox,et al.  Translation caching: skip, don't walk (the page table) , 2010, ISCA.

[28]  Amin Vahdat,et al.  Chronos: predictable low latency for data center applications , 2012, SoCC '12.

[29]  Matthew Hicks,et al.  Full-Speed Fuzzing: Reducing Fuzzing Overhead through Coverage-Guided Tracing , 2018, 2019 IEEE Symposium on Security and Privacy (SP).

[30]  Alfons Kemper,et al.  HyPer: A hybrid OLTP&OLAP main memory database system based on virtual memory snapshots , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[31]  Paarijaat Aditya,et al.  SAND: Towards High-Performance Serverless Computing , 2018, USENIX Annual Technical Conference.

[32]  Dong Du,et al.  EPTI: Efficient Defence against Meltdown Attack for Unpatched VMs , 2018, USENIX Annual Technical Conference.