Occupy the cloud: distributed computing for the 99%

Distributed computing remains inaccessible to a large number of users, in spite of many open source platforms and extensive commercial offerings. While distributed computation frameworks have moved beyond a simple map-reduce model, many users are still left to struggle with complex cluster management and configuration tools, even for running simple embarrassingly parallel jobs. We argue that stateless functions represent a viable platform for these users, eliminating cluster management overhead, fulfilling the promise of elasticity. Furthermore, using our prototype implementation, PyWren, we show that this model is general enough to implement a number of distributed computing models, such as BSP, efficiently. Extrapolating from recent trends in network bandwidth and the advent of disaggregated storage, we suggest that stateless functions are a natural fit for data processing in future computing environments.

[1]  Carlo Curino,et al.  Apache Hadoop YARN: yet another resource negotiator , 2013, SoCC.

[2]  Jon Howell,et al.  Flat Datacenter Storage , 2012, OSDI.

[3]  Ion Stoica,et al.  Ernest: Efficient Performance Prediction for Large-Scale Advanced Analytics , 2016, NSDI.

[4]  Alexander J. Smola,et al.  Scaling Distributed Machine Learning with the Parameter Server , 2014, OSDI.

[5]  Krste Asanovic,et al.  FireBox: A Hardware Building Block for 2020 Warehouse-Scale Computers , 2014 .

[6]  Jonathan M. Smith,et al.  From Lone Dwarfs to Giant Superclusters: Rethinking Operating System Abstractions for the Cloud , 2015, HotOS.

[7]  Randy H. Katz,et al.  A view of cloud computing , 2010, CACM.

[8]  Cordelia Schmid,et al.  Evaluation of GIST descriptors for web-scale image search , 2009, CIVR '09.

[9]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[10]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[11]  Jinyang Li,et al.  Piccolo: Building Fast, Distributed Programs with Partitioned Tables , 2010, OSDI.

[12]  John F. Canny,et al.  Big data analytics with small footprint: squaring the cloud , 2013, KDD.

[13]  Lu Fang,et al.  Interruptible tasks: treating memory pressure as interrupts for highly scalable data-parallel programs , 2015, SOSP.

[14]  Mendel Rosenblum,et al.  It's Time for Low Latency , 2011, HotOS.

[15]  Stephen J. Wright,et al.  Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent , 2011, NIPS.

[16]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[17]  Les Carr,et al.  UK Research Software Survey 2014 , 2014 .

[18]  Robbert van Renesse,et al.  Experiences with the Amoeba distributed operating system , 1990, CACM.

[19]  Sylvia Ratnasamy,et al.  Large-Scale Computation Not at the Cost of Expressiveness , 2013, HotOS.

[20]  Erik Tollerud,et al.  Software Use in Astronomy: an Informal Survey , 2015, ArXiv.

[21]  Scott Shenker,et al.  Disk-Locality in Datacenter Computing Considered Irrelevant , 2011, HotOS.

[22]  Michael Isard,et al.  Scalability! But at what COST? , 2015, HotOS.

[23]  Liang Dong,et al.  Starfish: A Self-tuning System for Big Data Analytics , 2011, CIDR.

[24]  Eyal de Lara,et al.  SnowFlock: rapid virtual machine cloning for cloud computing , 2009, EuroSys '09.

[25]  Frank Dabek,et al.  Large-scale Incremental Processing Using Distributed Transactions and Notifications , 2010, OSDI.

[26]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[27]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[28]  Patrick Wendell,et al.  Sparrow: distributed, low latency scheduling , 2013, SOSP.

[29]  Randy H. Katz,et al.  Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center , 2011, NSDI.

[30]  Scott Shenker,et al.  Network Requirements for Resource Disaggregation , 2016, OSDI.

[31]  Michael J. Franklin,et al.  Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[32]  Andrea C. Arpaci-Dusseau,et al.  Serverless Computation with OpenLambda , 2016, HotCloud.

[33]  Anirudh Sivaraman,et al.  Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads , 2017, NSDI.

[34]  Nicholas Carriero,et al.  Linda in context , 1989, CACM.

[35]  Scott Shenker,et al.  The Case for Tiny Tasks in Compute Clusters , 2013, HotOS.

[36]  Muthu Dayalan,et al.  MapReduce : Simplified Data Processing on Large Cluster , 2018 .

[37]  Michael Abd-El-Malek,et al.  Omega: flexible, scalable schedulers for large compute clusters , 2013, EuroSys '13.

[38]  Scott Shenker,et al.  Network support for resource disaggregation in next-generation datacenters , 2013, HotNets.

[39]  Jinyang Li,et al.  Building fast, distributed programs with partitioned tables , 2010 .

[40]  Fred Douglis,et al.  Transparent process migration: Design alternatives and the sprite implementation , 1991, Softw. Pract. Exp..

[41]  Andrew V. Goldberg,et al.  Quincy: fair scheduling for distributed computing clusters , 2009, SOSP '09.

[42]  Anton van den Hengel,et al.  Image-Based Recommendations on Styles and Substitutes , 2015, SIGIR.

[43]  Michael Isard,et al.  DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language , 2008, OSDI.