DataFlower: Exploiting the Data-flow Paradigm for Serverless Workflow Orchestration

Serverless computing that runs functions with auto-scaling is a popular task execution pattern in the cloud-native era. By connecting serverless functions into workflows, tenants can achieve complex functionality. Prior researches adopt the control-flow paradigm to orchestrate a serverless workflow. However, the control-flow paradigm inherently results in long response latency, due to the heavy data persistence overhead, sequential resource usage, and late function triggering. Our investigation shows that the data-flow paradigm has the potential to resolve the above problems, with careful design and optimization. We propose DataFlower, a scheme that achieves the data-flow paradigm for serverless workflows. In DataFlower, a container is abstracted to be a function logic unit and a data logic unit. The function logic unit runs the functions, and the data logic unit handles the data transmission asynchronously. Moreover, a host-container collaborative communication mechanism is used to support efficient data transfer. Our experimental results show that compared to state-of-the-art serverless designs, DataFlower reduces the 99\%-ile latency of the benchmarks by up to 35.4\%, and improves the peak throughput by up to 3.8X.

[1]  Edgardo Barsallo Yi,et al.  WISEFUSE , 2022, SIGMETRICS.

[2]  Marc Sánchez Artigas,et al.  Stateful Serverless Computing with Crucial , 2022, ACM Trans. Softw. Eng. Methodol..

[3]  Quan Chen,et al.  FaaSFlow: enable efficient workflow execution for function-as-a-service , 2022, ASPLOS.

[4]  Minyi Guo,et al.  The Serverless Computing Survey: A Technical Primer for Design Architecture , 2021, ACM Comput. Surv..

[5]  Edgardo Barsallo Yi,et al.  ORION and the Three Rights: Sizing, Bundling, and Prewarming for Serverless DAGs , 2022, OSDI.

[6]  Quan Chen,et al.  RunD: A Lightweight Secure Container Runtime for High-density Deployment and High-concurrency Startup in Serverless Computing , 2022, USENIX Annual Technical Conference.

[7]  Deze Zeng,et al.  Help Rather Than Recycle: Alleviating Cold Startup in Serverless Computing Through Inter-Function Container Sharing , 2022, USENIX Annual Technical Conference.

[8]  Christoph L. Gillum,et al.  Netherite: Efficient Execution of Serverless Workflows , 2022, Proc. VLDB Endow..

[9]  Mahmut T. Kandemir,et al.  Kraken: Adaptive Container Provisioning for Deploying Dynamic DAGs in Serverless Platforms , 2021, SoCC.

[10]  Emmett Witchel,et al.  Boki: Stateful Serverless Computing with Shared Logs , 2021, SOSP.

[11]  Emmett Witchel,et al.  Nightcore: efficient and scalable serverless computing for latency-sensitive, interactive microservices , 2021, ASPLOS.

[12]  Prateek Sharma,et al.  FaasCache: keeping serverless computing alive with greedy-dual caching , 2021, ASPLOS.

[13]  Hamzeh Khazaei,et al.  Modeling and Optimization of Performance and Cost of Serverless Applications , 2021, IEEE Transactions on Parallel and Distributed Systems.

[14]  Marios Kogias,et al.  Benchmarking, analysis, and optimization of serverless function snapshots , 2021, ASPLOS.

[15]  Vinod Ganapathy,et al.  Faastlane: Accelerating Function-as-a-Service Workflows , 2021, USENIX Annual Technical Conference.

[16]  Saurabh Bagchi,et al.  SONIC: Application-aware Data Passing for Chained Serverless Applications , 2021, USENIX Annual Technical Conference.

[17]  Wei Wang,et al.  Restructuring Serverless Computing with Data-Centric Function Orchestration , 2021, ArXiv.

[18]  Sasko Ristov,et al.  AFCL: An Abstract Function Choreography Language for serverless workflow specification , 2021, Future Gener. Comput. Syst..

[19]  Eric Rozner,et al.  Sequoia: enabling quality-of-service in serverless computing , 2020, SoCC.

[20]  Gustavo Alonso,et al.  Photons: lambdas on a diet , 2020, SoCC.

[21]  Panruo Wu,et al.  Wukong: a scalable and locality-enhanced framework for serverless parallel computing , 2020, SoCC.

[22]  Jorge Ejarque,et al.  A Programming Model for Hybrid Workflows: combining Task-based Workflows and Dataflows all-in-one , 2020, Future Gener. Comput. Syst..

[23]  Yubin Xia,et al.  Catalyzer: Sub-millisecond Startup for Serverless Computing with Initialization-less Booting , 2020, ASPLOS.

[24]  Peter Pietzuch,et al.  Faasm: Lightweight Isolation for Efficient Stateful Serverless Computing , 2020, USENIX Annual Technical Conference.

[25]  G. Alonso,et al.  Lambada: Interactive Data Analytics on Cold Data Using Serverless Cloud Infrastructure , 2019, SIGMOD Conference.

[26]  Bartosz Balis,et al.  Serverless execution of scientific workflows: Experiments with HyperFlow, AWS Lambda and Google Cloud Functions , 2017, Future Gener. Comput. Syst..

[27]  Alexandru Agache,et al.  Firecracker: Lightweight Virtualization for Serverless Applications , 2020, NSDI.

[28]  Kristiina Ausmees,et al.  SWEEP: Accelerating Scientific Research Through Scalable Serverless Workflows , 2019, UCC Companion.

[29]  Randy H. Katz,et al.  Cirrus: a Serverless Framework for End-to-end ML Workflows , 2019, SoCC.

[30]  David Wentzlaff,et al.  Architectural Implications of Function-as-a-Service Computing , 2019, MICRO.

[31]  Mainak Adhikari,et al.  A Survey on Scheduling Strategies for Workflows in Cloud Environment and Emerging Trends , 2019, ACM Comput. Surv..

[32]  Christoforos E. Kozyrakis,et al.  From Laptop to Lambda: Outsourcing Everyday Jobs to Thousands of Transient Functional Containers , 2019, USENIX Annual Technical Conference.

[33]  Yang Peng,et al.  GlobalFlow: A Cross-Region Orchestration Service for Serverless Computing Services , 2019, 2019 IEEE 12th International Conference on Cloud Computing (CLOUD).

[34]  Peng Wu,et al.  Replayable Execution Optimized for Page Sharing for a Managed Runtime Environment , 2019, EuroSys.

[35]  Yuriy Brun,et al.  Formal foundations of serverless computing , 2019, Proc. ACM Program. Lang..

[36]  David A. Patterson,et al.  Cloud Programming Simplified: A Berkeley View on Serverless Computing , 2019, ArXiv.

[37]  Joseph M. Hellerstein,et al.  Autoscaling tiered cloud storage in Anna , 2019, Proc. VLDB Endow..

[38]  Joseph M. Hellerstein,et al.  Serverless Computing: One Step Forward, Two Steps Back , 2018, CIDR.

[39]  Ion Stoica,et al.  Shuffling, Fast and Slow: Scalable Analytics on Serverless Infrastructure , 2019, NSDI.

[40]  Geoffrey M. Voelker,et al.  Sprocket: A Serverless Video Processing Framework , 2018, SoCC.

[41]  Christoforos E. Kozyrakis,et al.  Pocket: Elastic Ephemeral Storage for Serverless Analytics , 2018, OSDI.

[42]  Mengyuan Li,et al.  Peeking Behind the Curtains of Serverless Platforms , 2018, USENIX Annual Technical Conference.

[43]  Christoforos E. Kozyrakis,et al.  Understanding Ephemeral Storage for Serverless Analytics , 2018, USENIX Annual Technical Conference.

[44]  David G. Andersen,et al.  Putting the "Micro" Back in Microservice , 2018, USENIX Annual Technical Conference.

[45]  Paarijaat Aditya,et al.  SAND: Towards High-Performance Serverless Computing , 2018, USENIX Annual Technical Conference.

[46]  Paul R. Brenner,et al.  Serverless Computing: Design, Implementation, and Performance , 2017, 2017 IEEE 37th International Conference on Distributed Computing Systems Workshops (ICDCSW).

[47]  Bingsheng He,et al.  Revisiting the Design of Data Stream Processing Systems on Multi-Core Processors , 2017, 2017 IEEE 33rd International Conference on Data Engineering (ICDE).

[48]  Anirudh Sivaraman,et al.  Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads , 2017, NSDI.

[49]  Andrea C. Arpaci-Dusseau,et al.  Serverless Computation with OpenLambda , 2016, HotCloud.

[50]  Bartosz Balis,et al.  HyperFlow: A model of computation, programming approach and enactment engine for complex distributed workflows , 2016, Future Gener. Comput. Syst..

[51]  Miron Livny,et al.  Pegasus, a workflow management system for science automation , 2015, Future Gener. Comput. Syst..

[52]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[53]  Toshitsugu Yuba,et al.  Dataflow Computing Models, Languages, and Machines for Intelligence Computations , 1988, IEEE Trans. Software Eng..

[54]  Jack B. Dennis,et al.  A preliminary architecture for a basic data-flow processor , 1974, ISCA '75.