SONIC: Application-aware Data Passing for Chained Serverless Applications

Data analytics applications are increasingly leveraging serverless execution environments for their ease-of-use and pay-as-you-go billing. Increasingly, such applications are composed of multiple functions arranged in some workflow. However, the current approach of exchanging intermediate (ephemeral) data between functions through remote storage (such as S3) introduces significant performance overhead. We show that there are three alternative data-passing methods, which we call VM-Storage, Direct-Passing, and state-ofpractice Remote-Storage. Crucially, we show that no single data-passing method prevails under all scenarios and the optimal choice depends on dynamic factors such as the size of input data, the size of intermediate data, the application’s degree of parallelism, and network bandwidth. We propose SONIC, a data-passing manager that optimizes application performance and cost, by transparently selecting the optimal datapassing method for each edge of a serverless workflow DAG and implementing communication-aware function placement. SONIC monitors application parameters and uses simple regression models to adapt its hybrid data passing accordingly. We integrate SONIC with OpenLambda and evaluate the system on Amazon EC2 with three analytics applications, popular in the serverless environment. SONIC provides lower latency (raw performance) and higher performance/$ across diverse conditions, compared to four different baselines: SAND [UsenixATC-18], Vanilla OpenLambda [HotCloud-16], OpenLambda integrated with Pocket [OSDI-18], and AWS Lambda (state of practice).

[1]  Saurabh Bagchi,et al.  ApproxDet: content and contention-aware approximate object detection for mobiles , 2020, SenSys.

[2]  Benjamin Recht,et al.  Serverless linear algebra , 2020, SoCC.

[3]  T. Moscibroda,et al.  Protean: VM Allocation Service at Scale , 2020, OSDI.

[4]  Jashwant Raj Gunasekaran,et al.  Fifer: Tackling Underutilization in the Serverless Era , 2020, ArXiv.

[5]  Alexandru Iosup,et al.  Towards Supporting Millions of Users in Modifiable Virtual Environments by Redesigning Minecraft-Like Games as Serverless Systems , 2020, HotCloud.

[6]  Han Dong,et al.  SEUSS: skip redundant paths to make serverless fast , 2020, EuroSys.

[7]  Tan N. Le,et al.  AlloX: compute allocation in hybrid clusters , 2020, EuroSys.

[8]  Joseph E. Gonzalez,et al.  A fault-tolerance shim for serverless computing , 2020, EuroSys.

[9]  Ricardo Bianchini,et al.  Serverless in the Wild: Characterizing and Optimizing the Serverless Workload at a Large Cloud Provider , 2020, USENIX Annual Technical Conference.

[10]  Jose M. Faleiro,et al.  Cloudburst , 2020, Proc. VLDB Endow..

[11]  Marc Sánchez Artigas,et al.  On the FaaS Track: Building Stateful Distributed Applications with Serverless Architectures , 2019, Middleware.

[12]  G. Alonso,et al.  Lambada: Interactive Data Analytics on Cold Data Using Serverless Cloud Infrastructure , 2019, SIGMOD Conference.

[13]  Raul Castro Fernandez,et al.  Starling: A Scalable Query Engine on Cloud Functions , 2019, SIGMOD Conference.

[14]  Ryan Stutsman,et al.  Narrowing the Gap Between Serverless and its State with Storage Functions , 2019, SoCC.

[15]  Alexey Tumanov,et al.  Cirrus: a Serverless Framework for End-to-end ML Workflows , 2019, SoCC.

[16]  Yuqing Zhu,et al.  ClassyTune: A Performance Auto-Tuner for Systems in the Cloud , 2019, IEEE Transactions on Cloud Computing.

[17]  Christoforos E. Kozyrakis,et al.  From Laptop to Lambda: Outsourcing Everyday Jobs to Thousands of Transient Functional Containers , 2019, USENIX Annual Technical Conference.

[18]  George Kesidis,et al.  Spock: Exploiting Serverless Functions for SLO and Cost Aware Resource Procurement in Public Cloud , 2019, 2019 IEEE 12th International Conference on Cloud Computing (CLOUD).

[19]  David A. Patterson,et al.  Cloud Programming Simplified: A Berkeley View on Serverless Computing , 2019, ArXiv.

[20]  Joseph M. Hellerstein,et al.  Serverless Computing: One Step Forward, Two Steps Back , 2018, CIDR.

[21]  Geoffrey M. Voelker,et al.  Sprocket: A Serverless Video Processing Framework , 2018, SoCC.

[22]  Christoforos E. Kozyrakis,et al.  Pocket: Elastic Ephemeral Storage for Serverless Analytics , 2018, OSDI.

[23]  Hongzi Mao,et al.  Learning scheduling algorithms for data processing clusters , 2018, SIGCOMM.

[24]  Gul Agha,et al.  Costless: Optimizing Cost of Serverless Computing through Function Fusion and Placement , 2018, 2018 IEEE/ACM Symposium on Edge Computing (SEC).

[25]  Andrea C. Arpaci-Dusseau,et al.  SOCK: Rapid Task Provisioning with Serverless-Optimized Containers , 2018, USENIX Annual Technical Conference.

[26]  Christoforos E. Kozyrakis,et al.  Understanding Ephemeral Storage for Serverless Analytics , 2018, USENIX Annual Technical Conference.

[27]  Nhan Nguyen,et al.  Towards Automatic Tuning of Apache Spark Configuration , 2018, 2018 IEEE 11th International Conference on Cloud Computing (CLOUD).

[28]  Jimmy J. Lin,et al.  Serverless Data Analytics with Flint , 2018, 2018 IEEE 11th International Conference on Cloud Computing (CLOUD).

[29]  Tim Menzies,et al.  Scout: An Experienced Guide to Find the Best Cloud Configuration , 2018, ArXiv.

[30]  Michael I. Jordan,et al.  Ray: A Distributed Framework for Emerging AI Applications , 2017, OSDI.

[31]  Saurabh Bagchi,et al.  Rafiki: a middleware for parameter tuning of NoSQL datastores for dynamic metagenomics workloads , 2017, Middleware.

[32]  Ricardo Bianchini,et al.  Resource Central: Understanding and Predicting Workloads for Improved Resource Management in Large Cloud Platforms , 2017, SOSP.

[33]  Dhabaleswar K. Panda,et al.  Optimized Broadcast for Deep Learning Workloads on Dense-GPU InfiniBand Clusters: MPI or NCCL? , 2017, EuroMPI.

[34]  Anirudh Sivaraman,et al.  Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads , 2017, NSDI.

[35]  Andrea C. Arpaci-Dusseau,et al.  Serverless Computation with OpenLambda , 2016, HotCloud.

[36]  Constantine Caramanis,et al.  Fast Algorithms for Robust PCA via Gradient Descent , 2016, NIPS.

[37]  Mohammad Arjomand,et al.  Evaluating the Combined Impact of Node Architecture and Cloud Workload Characteristics on Network Traffic and Performance/Cost , 2015, 2015 IEEE International Symposium on Workload Characterization.

[38]  Ao Tang,et al.  Timing is Everything: Accurate, Minimum Overhead, Available Bandwidth Estimation in High-speed Wired Networks , 2014, Internet Measurement Conference.

[39]  Feng Wang,et al.  A deep investigation into network performance in virtual machine based cloud environments , 2014, IEEE INFOCOM 2014 - IEEE Conference on Computer Communications.

[40]  Ali Ghodsi,et al.  Scalable atomic visibility with RAMP transactions , 2014, SIGMOD Conference.

[41]  L. Deng,et al.  The MNIST Database of Handwritten Digit Images for Machine Learning Research [Best of the Web] , 2012, IEEE Signal Processing Magazine.

[42]  Fernando Pedone,et al.  High performance state-machine replication , 2011, 2011 IEEE/IFIP 41st International Conference on Dependable Systems & Networks (DSN).

[43]  Steven Hand,et al.  CIEL: A Universal Execution Engine for Distributed Data-Flow Computing , 2011, NSDI.

[44]  Benjamin Hindman,et al.  Dominant Resource Fairness: Fair Allocation of Multiple Resource Types , 2011, NSDI.

[45]  M. Zaharia,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[46]  Yuan Yu,et al.  Dryad: distributed data-parallel programs from sequential building blocks , 2007, EuroSys '07.

[47]  G. David Forney,et al.  The Viterbi Algorithm: A Personal History , 2005, ArXiv.

[48]  Richard G. Baraniuk,et al.  pathChirp: Efficient available bandwidth estimation for network paths , 2003 .

[49]  Alexandru Agache,et al.  Firecracker: Lightweight Virtualization for Serverless Applications , 2020, NSDI.

[50]  Saurabh Bagchi,et al.  OPTIMUSCLOUD: Heterogeneous Configuration Optimization for Distributed Databases in the Cloud , 2020, USENIX Annual Technical Conference.

[51]  Ion Stoica,et al.  Shuffling, Fast and Slow: Scalable Analytics on Serverless Infrastructure , 2019, NSDI.

[52]  Schahram Dustdar,et al.  Towards a Serverless Platform for Edge AI , 2019, HotEdge.

[53]  Paul Wood,et al.  SOPHIA: Online Reconfiguration of Clustered NoSQL Databases for Time-Varying Workloads , 2019, USENIX Annual Technical Conference.

[54]  Joao Carreira,et al.  A Case for Serverless Machine Learning , 2018 .

[55]  Istemi Ekin Akkus,et al.  SAND: Towards High-Performance Serverless Computing , 2018, USENIX Annual Technical Conference.

[56]  A. Sommerfeld,et al.  Viterbi Algorithm , 2010, Encyclopedia of Machine Learning.

[57]  Zhao Wen-tao,et al.  Efficient available bandwidth estimation for network paths , 2008 .

[58]  Silvia Figueira,et al.  Improving Binomial Trees for Broadcasting in Local Networks of Workstations 1 , 2002 .

[59]  Patrick J. Grother,et al.  NIST Special Database 19 Handprinted Forms and Characters Database , 1995 .

[60]  Srikanth Kandula,et al.  This Paper Is Included in the Proceedings of the 12th Usenix Symposium on Operating Systems Design and Implementation (osdi '16). Graphene: Packing and Dependency-aware Scheduling for Data-parallel Clusters G: Packing and Dependency-aware Scheduling for Data-parallel Clusters , 2022 .