Pocket: Elastic Ephemeral Storage for Serverless Analytics

Serverless computing is becoming increasingly popular, enabling users to quickly launch thousands of short-lived tasks in the cloud with high elasticity and fine-grain billing. These properties make serverless computing appealing for interactive data analytics. However exchanging intermediate data between execution stages in an analytics job is a key challenge as direct communication between serverless tasks is difficult. The natural approach is to store such ephemeral data in a remote data store. However, existing storage systems are not designed to meet the demands of serverless applications in terms of elasticity, performance, and cost. We present Pocket, an elastic, distributed data store that automatically scales to provide applications with desired performance at low cost. Pocket dynamically rightsizes resources across multiple dimensions (CPU cores, network bandwidth, storage capacity) and leverages multiple storage technologies to minimize cost while ensuring applications are not bottlenecked on I/O. We show that Pocket achieves similar performance to ElastiCache Redis for serverless analytics applications while reducing cost by almost 60%.

[1]  Karl-Erwin Großpietsch,et al.  Fault tolerance , 1994, IEEE Micro.

[2]  Chandramohan A. Thekkath,et al.  Petal: distributed virtual disks , 1996, ASPLOS VII.

[3]  Arif Merchant,et al.  Minerva: An automated resource provisioning tool for large-scale storage systems , 2001, TOCS.

[4]  Amin Vahdat,et al.  Managing energy and server resources in hosting centers , 2001, SOSP.

[5]  Steven S. Seiden,et al.  On the online bin packing problem , 2001, JACM.

[6]  Eric Anderson,et al.  Proceedings of the Fast 2002 Conference on File and Storage Technologies Hippodrome: Running Circles around Storage Administration , 2022 .

[7]  Wei Jin,et al.  USENIX Association Proceedings of USITS ’ 03 : 4 th USENIX Symposium on Internet Technologies and Systems , 2003 .

[8]  Asser N. Tantawi,et al.  An analytical model for multi-tier internet services and its applications , 2005, SIGMETRICS '05.

[9]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.

[10]  Suman Nath,et al.  Energy-Aware Server Provisioning and Load Dispatching for Connection-Intensive Internet Services , 2008, NSDI.

[11]  Christos Faloutsos,et al.  Using Utility to Provision Storage Systems , 2008, FAST.

[12]  Kevin Skadron,et al.  Multi-mode energy management for multi-tier server clusters , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[13]  Antony I. T. Rowstron,et al.  Migrating server storage to SSDs: analysis of tradeoffs , 2009, EuroSys '09.

[14]  Luiz André Barroso,et al.  The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines , 2009, The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines.

[15]  Jeffrey S. Chase,et al.  Automated control for elastic storage , 2010, ICAC '10.

[16]  Randy H. Katz,et al.  NapSAC: design and implementation of a power-proportional web cluster , 2010, CCRV.

[17]  Lenin Ravindranath,et al.  Nectar: Automatic Management of Data and Computation in Datacenters , 2010, OSDI.

[18]  Herodotos Herodotou,et al.  No one (cluster) size fits all: automatic cluster sizing for data-intensive analytics , 2011, SoCC.

[19]  Xiaohui Gu,et al.  CloudScale: elastic resource scaling for multi-tenant cloud systems , 2011, SoCC.

[20]  Manish Marwah,et al.  Minimizing data center SLA violations and power consumption via hybrid resource provisioning , 2011, 2011 International Green Computing Conference and Workshops.

[21]  Himabindu Pucha,et al.  Cost Effective Storage using Extent Based Dynamic Tiering , 2011, FAST.

[22]  Aniruddha S. Gokhale,et al.  Efficient Autoscaling in the Cloud Using Predictive Models for Workload Forecasting , 2011, 2011 IEEE 4th International Conference on Cloud Computing.

[23]  Cheng Huang,et al.  Erasure Coding in Windows Azure Storage , 2012, USENIX Annual Technical Conference.

[24]  Michael J. Franklin,et al.  Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[25]  Mor Harchol-Balter,et al.  AutoScale: Dynamic, Robust Capacity Management for Multi-Tier Data Centers , 2012, TOCS.

[26]  Cheng Huang,et al.  Rethinking erasure codes for cloud file systems: minimizing I/O for recovery and degraded reads , 2012, FAST.

[27]  Arif Merchant,et al.  Janus: Optimal Flash Provisioning for Cloud Storage Workloads , 2013, USENIX Annual Technical Conference.

[28]  Luiz André Barroso,et al.  The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines, Second Edition , 2013, The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines, Second Edition.

[29]  Dimitris S. Papailiopoulos,et al.  XORing Elephants: Novel Erasure Codes for Big Data , 2013, Proc. VLDB Endow..

[30]  Xiaohui Gu,et al.  AGILE: Elastic Distributed Resource Scaling for Infrastructure-as-a-Service , 2013, ICAC.

[31]  Christina Delimitrou,et al.  Quasar: resource-efficient and QoS-aware cluster management , 2014, ASPLOS.

[32]  Francisco Vilar Brasileiro,et al.  Long-term SLOs for reclaimed cloud computing resources , 2014, SoCC.

[33]  Scott Shenker,et al.  Tachyon: Reliable, Memory Speed Storage for Cluster Computing Frameworks , 2014, SoCC.

[34]  Douglas B. Terry,et al.  A Self-Configurable Geo-Replicated Cloud Storage System , 2014, OSDI.

[35]  Marco Aurélio Stelmar Netto,et al.  Evaluating Auto-scaling Strategies for Cloud Computing Environments , 2014, 2014 IEEE 22nd International Symposium on Modelling, Analysis & Simulation of Computer and Telecommunication Systems.

[36]  Abhishek Verma,et al.  Large-scale cluster management at Google with Borg , 2015, EuroSys.

[37]  Ashish Motivala,et al.  The Snowflake Elastic Data Warehouse , 2016, SIGMOD Conference.

[38]  Anshul Gandhi,et al.  Using machine learning for black-box autoscaling , 2016, 2016 Seventh International Green and Sustainable Computing Conference (IGSC).

[39]  Ion Stoica,et al.  Ernest: Efficient Performance Prediction for Large-Scale Advanced Analytics , 2016, NSDI.

[40]  V. Srinivasan,et al.  Aerospike: Architecture of a Real-Time Operational DBMS , 2016, Proc. VLDB Endow..

[41]  Ricardo Bianchini,et al.  History-Based Harvesting of Spare Cycles and Storage in Large-Scale Datacenters , 2016, OSDI.

[42]  Nikolas Ioannou,et al.  Crail: A High-Performance I/O Architecture for Distributed Data Processing , 2017, IEEE Data Eng. Bull..

[43]  Andrew Warfield,et al.  Mirador: An Active Control Plane for Datacenter Storage , 2017, FAST.

[44]  Minlan Yu,et al.  CherryPick: Adaptively Unearthing the Best Cloud Configurations for Big Data Analytics , 2017, NSDI.

[45]  Randy H. Katz,et al.  Selecting the best VM across multiple public clouds: a data-driven performance modeling approach , 2017, SoCC.

[46]  Ion Stoica,et al.  Occupy the cloud: distributed computing for the 99% , 2017, SoCC.

[47]  Ricardo Bianchini,et al.  Resource Central: Understanding and Predicting Workloads for Improved Resource Management in Large Cloud Platforms , 2017, SOSP.

[48]  Philip Levis,et al.  Execution Templates: Caching Control Plane Decisions for Strong Scaling of Data Analytics , 2017, USENIX Annual Technical Conference.

[49]  Christoforos E. Kozyrakis,et al.  ReFlex: Remote Flash ≈ Local Flash , 2017, ASPLOS.

[50]  Anirudh Sivaraman,et al.  Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads , 2017, NSDI.

[51]  Christoforos E. Kozyrakis,et al.  Selecta: Heterogeneous Cloud Storage Configuration for Data Analytics , 2018, USENIX Annual Technical Conference.

[52]  Christoforos E. Kozyrakis,et al.  Understanding Ephemeral Storage for Serverless Analytics , 2018, USENIX Annual Technical Conference.

[53]  Rajkumar Buyya,et al.  Auto-Scaling Web Applications in Clouds , 2018, ACM Comput. Surv..

[54]  Ryan Stutsman,et al.  Crail : A High-Performance I / O Architecture for Distributed Data Processing , .