Serverless End Game: Disaggregation enabling Transparency

For many years, the distributed systems community has struggled to smooth the transition from local to remote computing. Transparency means concealing the complexities of distributed programming like remote locations, failures or scaling. For us, full transparency implies that we can compile, debug and run unmodified single-machine code over effectively unlimited compute, storage, and memory resources. We elaborate in this article why resource disaggregation in serverless computing is the definitive catalyst to enable full transparency in the Cloud. We demonstrate with two experiments that we can achieve transparency today over disaggregated serverless resources and obtain comparable performance to local executions. We also show that locality cannot be neglected for many problems and we present five open research challenges: granular middleware and locality, memory disaggregation, virtualization, elastic programming models, and optimized deployment. If full transparency is possible, who needs explicit use of middleware if you can treat remote entities as local ones? Can we close the curtains of distributed systems complexity for the majority of users?

[1]  Yiying Zhang,et al.  LegoOS: A Disseminated, Distributed OS for Hardware Resource Disaggregation , 2018, OSDI.

[2]  Joseph M. Hellerstein,et al.  Serverless Computing: One Step Forward, Two Steps Back , 2018, CIDR.

[3]  Tao Chen,et al.  Millions of Tiny Databases , 2020, NSDI.

[4]  Miguel Castro,et al.  FaRM: Fast Remote Memory , 2014, NSDI.

[5]  Willy Zwaenepoel,et al.  Implementation and performance of Munin , 1991, SOSP '91.

[6]  Ion Stoica,et al.  Occupy the cloud: distributed computing for the 99% , 2017, SoCC.

[7]  Ion Stoica,et al.  Numpywren: Serverless Linear Algebra , 2018, ArXiv.

[8]  Kenneth O. Stanley,et al.  Fiber: A Platform for Efficient Development and Distributed Training for Reinforcement Learning and Population-Based Methods , 2020, ArXiv.

[9]  Mendel Rosenblum,et al.  It's Time for Low Latency , 2011, HotOS.

[10]  James Cheng,et al.  Tangram: Bridging Immutable and Mutable Abstractions for Distributed Data Analytics , 2019, USENIX Annual Technical Conference.

[11]  Michael Kaminsky,et al.  Datacenter RPCs can be General and Fast , 2018, NSDI.

[12]  Mengyuan Li,et al.  Peeking Behind the Curtains of Serverless Platforms , 2018, USENIX Annual Technical Conference.

[13]  Animesh Trivedi,et al.  Unification of Temporary Storage in the NodeKernel Architecture , 2019, USENIX ATC.

[14]  Timothy Roscoe,et al.  Arrakis , 2014, OSDI.

[15]  Scott Shenker,et al.  Disk-Locality in Datacenter Computing Considered Irrelevant , 2011, HotOS.

[16]  David A. Patterson,et al.  Attack of the killer microseconds , 2017, Commun. ACM.

[17]  Jim Waldo,et al.  A Note on Distributed Computing , 1996, Mobile Object Systems.

[18]  Christian Wimmer,et al.  One VM to rule them all , 2013, Onward!.

[19]  Christoforos E. Kozyrakis,et al.  Pocket: Elastic Ephemeral Storage for Serverless Analytics , 2018, OSDI.

[20]  Yiyu Yao,et al.  Granular Computing , 2008 .

[21]  Marc Sánchez Artigas,et al.  On the FaaS Track: Building Stateful Distributed Applications with Serverless Architectures , 2019, Middleware.

[22]  Manuel Le Gallo,et al.  Memory devices and applications for in-memory computing , 2020, Nature Nanotechnology.

[23]  Alexandru Agache,et al.  Firecracker: Lightweight Virtualization for Serverless Applications , 2020, NSDI.

[24]  Peter Pietzuch,et al.  Faasm: Lightweight Isolation for Efficient Stateful Serverless Computing , 2020, USENIX Annual Technical Conference.

[25]  Edouard Bugnion,et al.  R2P2: Making RPCs first-class datacenter citizens , 2019, USENIX ATC.

[26]  Chen Wang,et al.  FfDL: A Flexible Multi-tenant Deep Learning Platform , 2019, Middleware.

[27]  Joseph M. Hellerstein,et al.  Cloudburst , 2020, Proc. VLDB Endow..

[28]  Yanzhao Wu,et al.  Memory Disaggregation: Research Problems and Opportunities , 2019, 2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS).

[29]  Scott Shenker,et al.  Network Requirements for Resource Disaggregation , 2016, OSDI.

[30]  David Carrera,et al.  Sequence-to-sequence models for workload interference prediction on batch processing datacenters , 2020, Future Gener. Comput. Syst..

[31]  R. Sternberg,et al.  The fork in the road , 2017, Behavioral and Brain Sciences.

[32]  Michael Isard,et al.  Scalability! But at what COST? , 2015, HotOS.

[33]  Rodrigo Fonseca,et al.  Retro: Targeted Resource Management in Multi-tenant Distributed Systems , 2015, NSDI.