Kappa: a programming framework for serverless computing

Serverless computing has recently emerged as a new paradigm for running software on the cloud. In this paradigm, programs need to be expressed as a set of short-lived tasks, each of which can complete within a short bounded time (e.g., 15 minutes on AWS Lambda). Serverless computing is beneficial to cloud providers---by allowing them to better utilize resources---and to users---by simplifying management and enabling greater elasticity. However, developing applications to run in this environment is challenging, requiring users to appropriately partition their code, develop new coordination mechanisms, and deal with failure recovery. In this paper, we propose Kappa, a framework that simplifies serverless development. It uses checkpointing to handle lambda function timeouts, and provides concurrency mechanisms that enable parallel computation and coordination.

[1]  Suman Banerjee,et al.  Hermes: A Real Time Hypervisor for Mobile and IoT Systems , 2018, HotMobile.

[2]  Liuba Shrira,et al.  Promises: linguistic support for efficient asynchronous procedure calls in distributed systems , 1988, PLDI '88.

[3]  Sebastiano Vigna,et al.  UbiCrawler: a scalable fully distributed Web crawler , 2004, Softw. Pract. Exp..

[4]  John C. Reynolds,et al.  The discoveries of continuations , 1993, LISP Symb. Comput..

[5]  Marc Sánchez Artigas,et al.  Serverless Data Analytics in the IBM Cloud , 2018, Middleware Industry.

[6]  Brandon Lucia,et al.  Adaptive Dynamic Checkpointing for Safe Efficient Intermittent Computing , 2018, OSDI.

[7]  Steven Hand,et al.  CIEL: A Universal Execution Engine for Distributed Data-Flow Computing , 2011, NSDI.

[8]  Michael I. Jordan,et al.  Ray: A Distributed Framework for Emerging AI Applications , 2017, OSDI.

[9]  Gustavo Alonso,et al.  Pydron: Semi-Automatic Parallelization for Multi-Core and the Cloud , 2014, OSDI.

[10]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[11]  Gene Cooperman,et al.  DMTCP: Transparent checkpointing for cluster computations and the desktop , 2007, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[12]  Dilma Da Silva,et al.  Exploring Serverless Computing for Neural Network Training , 2018, 2018 IEEE 11th International Conference on Cloud Computing (CLOUD).

[13]  Danny Jones,et al.  VM Live Migration At Scale , 2018, VEE.

[14]  Sebastian Werner,et al.  Serverless Big Data Processing using Matrix Multiplication as Example , 2018, 2018 IEEE International Conference on Big Data (Big Data).

[15]  Alan Mycroft,et al.  Kilim: Isolation-Typed Actors for Java , 2008, ECOOP.

[16]  Ion Stoica,et al.  Shuffling, Fast and Slow: Scalable Analytics on Serverless Infrastructure , 2019, NSDI.

[17]  Jason Duell,et al.  Berkeley Lab Checkpoint/Restart (BLCR) for Linux Clusters , 2006 .

[18]  Erik A. Hendriks,et al.  BProc: the Beowulf distributed process space , 2002, ICS '02.

[19]  Frances E. Allen,et al.  Control-flow analysis , 2022 .

[20]  Peng Wu,et al.  Replayable Execution Optimized for Page Sharing for a Managed Runtime Environment , 2019, EuroSys.

[21]  Andrew Warfield,et al.  Live migration of virtual machines , 2005, NSDI.

[22]  Peter Pietzuch,et al.  Faasm: Lightweight Isolation for Efficient Stateful Serverless Computing , 2020, USENIX Annual Technical Conference.

[23]  Randy H. Katz,et al.  Cirrus: a Serverless Framework for End-to-end ML Workflows , 2019, SoCC.

[24]  Florian Loitsch,et al.  Exceptional Continuations in JavaScript , 2007 .

[25]  Joseph E. Gonzalez,et al.  Optimizing Prediction Serving on Low-Latency Serverless Dataflow , 2020, ArXiv.

[26]  Matthew Hicks,et al.  Intermittent Computation without Hardware Support or Programmer Intervention , 2016, OSDI.

[27]  James Mickens,et al.  Pivot: Fast, Synchronous Mashup Isolation Using Generator Chains , 2014, 2014 IEEE Symposium on Security and Privacy.

[28]  Vatche Ishakian,et al.  Serving Deep Learning Models in a Serverless Platform , 2017, 2018 IEEE International Conference on Cloud Engineering (IC2E).

[29]  Joe Armstrong,et al.  Making reliable distributed systems in the presence of software errors , 2003 .

[30]  Ricardo Bianchini,et al.  Resource Central: Understanding and Predicting Workloads for Improved Resource Management in Large Cloud Platforms , 2017, SOSP.

[31]  Hua Zhong,et al.  CRAK: Linux Checkpoint/Restart As a Kernel Module , 1996 .

[32]  Vipul Gupta,et al.  OverSketched Newton: Fast Convex Optimization for Serverless Systems , 2019, 2020 IEEE International Conference on Big Data (Big Data).

[33]  Kannan Ramchandran,et al.  Serverless Straggler Mitigation using Local Error-Correcting Codes , 2020, ArXiv.

[34]  Eddie Kohler,et al.  Events Can Make Sense , 2007, USENIX Annual Technical Conference.

[35]  Ion Stoica,et al.  Occupy the cloud: distributed computing for the 99% , 2017, SoCC.

[36]  Paarijaat Aditya,et al.  SAND: Towards High-Performance Serverless Computing , 2018, USENIX Annual Technical Conference.

[37]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[38]  Gary Lindstrom,et al.  A portable mechanism for thread persistence and migration (mobile agent) , 2001 .

[39]  Kai Li,et al.  Libckpt: Transparent Checkpointing under UNIX , 1995, USENIX.

[40]  Joe Gibbs Politz,et al.  Putting in all the stops: execution control for JavaScript , 2018, PLDI.

[41]  Christoforos E. Kozyrakis,et al.  Understanding Ephemeral Storage for Serverless Analytics , 2018, USENIX Annual Technical Conference.

[42]  B. Ramkumar,et al.  Portable checkpointing for heterogeneous architectures , 1997, Proceedings of IEEE 27th International Symposium on Fault Tolerant Computing.

[43]  Joseph M. Hellerstein,et al.  Cloudburst , 2020, Proc. VLDB Endow..

[44]  Ricardo Bianchini,et al.  Serverless in the Wild: Characterizing and Optimizing the Serverless Workload at a Large Cloud Provider , 2020, USENIX Annual Technical Conference.

[45]  Wes McKinney,et al.  Data Structures for Statistical Computing in Python , 2010, SciPy.

[46]  Joseph E. Gonzalez,et al.  A fault-tolerance shim for serverless computing , 2020, EuroSys.

[47]  Andrew W. Appel,et al.  Compiling with Continuations , 1991 .

[48]  Stefan Fünfrocken,et al.  Transparent migration of Java-based mobile agents: Capturing and re-establishing the state of Java programs , 1998, Personal Technologies.

[49]  Akinori Yonezawa,et al.  A Simple Extension of Java Language for Controllable Transparent Migration and Its Portable Implementation , 1999, COORDINATION.

[50]  Anirudh Sivaraman,et al.  Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads , 2017, NSDI.

[51]  Akinori Yonezawa,et al.  Portable Implementation of Continuation Operators in Imperative Languages by Exception Handling , 2000, Advances in Exception Handling Techniques.

[52]  Kshitij Doshi,et al.  Agile Cold Starts for Scalable Serverless , 2019, HotCloud.

[53]  Joseph K. Bradley,et al.  Spark SQL: Relational Data Processing in Spark , 2015, SIGMOD Conference.

[54]  Joe Marshall,et al.  Continuations from generalized stack inspection , 2005, ICFP '05.

[55]  Christoforos E. Kozyrakis,et al.  Centralized Core-granular Scheduling for Serverless Functions , 2019, SoCC.

[56]  Christoforos E. Kozyrakis,et al.  From Laptop to Lambda: Outsourcing Everyday Jobs to Thousands of Transient Functional Containers , 2019, USENIX Annual Technical Conference.

[57]  Brian N. Bershad,et al.  Using continuations to implement thread management and communication in operating systems , 1991, SOSP '91.

[58]  David A. Patterson,et al.  Cloud Programming Simplified: A Berkeley View on Serverless Computing , 2019, ArXiv.

[59]  Geoffrey M. Voelker,et al.  Sprocket: A Serverless Video Processing Framework , 2018, SoCC.

[60]  Christoforos E. Kozyrakis,et al.  Pocket: Elastic Ephemeral Storage for Serverless Analytics , 2018, OSDI.

[61]  Ping-Min Lin,et al.  Mitigating Cold Starts in Serverless Platforms: A Pool-Based Approach , 2019, ArXiv.

[62]  David E. Culler,et al.  SEDA: an architecture for well-conditioned, scalable internet services , 2001, SOSP.

[63]  Mikael Johansson,et al.  Harnessing the Power of Serverless Runtimes for Large-Scale Optimization , 2019, ArXiv.

[64]  Ion Stoica,et al.  Numpywren: Serverless Linear Algebra , 2018, ArXiv.

[65]  Joseph M. Hellerstein,et al.  Transactional Causal Consistency for Serverless Computing , 2020, SIGMOD Conference.

[66]  Andrea C. Arpaci-Dusseau,et al.  SOCK: Rapid Task Provisioning with Serverless-Optimized Containers , 2018, USENIX Annual Technical Conference.

[67]  Joseph M. Hellerstein,et al.  Serverless Computing: One Step Forward, Two Steps Back , 2018, CIDR.

[68]  Leonid Ryzhyk,et al.  Secure serverless computing using dynamic information flow control , 2018, Proc. ACM Program. Lang..