Serverless Supercomputing: High Performance Function as a Service for Science

Growing data volumes and velocities are driving exciting new methods across the sciences in which data analytics and machine learning are increasingly intertwined with research. These new methods require new approaches for scientific computing in which computation is mobile, so that, for example, it can occur near data, be triggered by events (e.g., arrival of new data), or be offloaded to specialized accelerators. They also require new design approaches in which monolithic applications can be decomposed into smaller components, that may in turn be executed separately and on the most efficient resources. To address these needs we propose funcX---a high-performance function-as-a-service (FaaS) platform that enables intuitive, flexible, efficient, scalable, and performant remote function execution on existing infrastructure including clouds, clusters, and supercomputers. It allows users to register and then execute Python functions without regard for the physical resource location, scheduler architecture, or virtualization technology on which the function is executed---an approach we refer to as "serverless supercomputing." We motivate the need for funcX in science, describe our prototype implementation, and demonstrate, via experiments on two supercomputers, that funcX can process millions of functions across more than 65000 concurrent workers. We also outline five scientific scenarios in which funcX has been deployed and highlight the benefits of funcX in these scenarios.

[1]  Ian T. Foster,et al.  Toward scalable monitoring on large-scale storage for software defined cyberinfrastructure , 2017, PDSW-DISCS@SC.

[2]  Michael E. Jennings,et al.  Charliecloud: Unprivileged Containers for User-Defined Software Stacks in HPC , 2017 .

[3]  Geoffrey C. Fox,et al.  Status of Serverless Computing and Function-as-a-Service(FaaS) in Industry and Research , 2017, ArXiv.

[4]  Randy H. Katz,et al.  Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center , 2011, NSDI.

[5]  Steffen Hauf,et al.  Megahertz serial crystallography , 2018, Nature Communications.

[6]  Douglas F. Parkhill,et al.  The Challenge of the Computer Utility , 1966 .

[7]  Garth J. Williams,et al.  High-Resolution Protein Structure Determination by Serial Femtosecond Crystallography , 2012, Science.

[8]  Nicholas K. Sauter,et al.  The DIALS framework for integration software , 2013 .

[9]  Ian T. Foster,et al.  Globus Platform Services for Data Publication , 2018, PEARC.

[10]  Bin Cheng,et al.  Real-time data reduction at the network edge of Internet-of-Things systems , 2015, 2015 11th International Conference on Network and Service Management (CNSM).

[11]  Dirk Merkel,et al.  Docker: lightweight Linux containers for consistent development and deployment , 2014 .

[12]  Brendan Burns,et al.  Kubernetes: Up and Running: Dive into the Future of Infrastructure , 2017 .

[13]  Ian T. Foster,et al.  DLHub: Model and Data Serving for Science , 2018, 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[14]  Ian T. Foster,et al.  Skluma: An Extensible Metadata Extraction Pipeline for Disorganized Data , 2018, 2018 IEEE 14th International Conference on e-Science (e-Science).

[15]  Rajkumar Buyya,et al.  A taxonomy and survey of grid resource management systems for distributed computing , 2002, Softw. Pract. Exp..

[16]  Maciej Malawski,et al.  Towards Serverless Execution of Scientific Workflows - HyperFlow Case Study , 2016, WORKS@SC.

[17]  Schahram Dustdar,et al.  Cloud computing for small research groups in computational science and engineering: current status and outlook , 2010, Computing.

[18]  Michael I. Jordan,et al.  Ray: A Distributed Framework for Emerging AI Applications , 2017, OSDI.

[19]  Perry Cheng,et al.  Serverless Computing: Current Trends and Open Problems , 2017, Research Advances in Cloud Computing.

[20]  Josef Spillner,et al.  FaaSter, Better, Cheaper: The Prospect of Serverless Scientific Computing and HPC , 2017, CARLA.

[21]  Blesson Varghese,et al.  Cloud Futurology , 2019, Computer.

[22]  Tim Head,et al.  Reproducible Research Environments with Repo2Docker , 2018 .

[23]  Francesco De Carlo,et al.  TomoPy: a framework for the analysis of synchrotron tomographic data , 2014, Optics & Photonics - Optical Engineering + Applications.

[24]  Wiggins Ws THE CHALLENGE OF THE COMPUTER. , 1964 .

[25]  Matthew W. Vaughn,et al.  Containers-as-a-service via the Actor Model , 2017 .

[26]  Ian T. Foster,et al.  The Anatomy of the Grid: Enabling Scalable Virtual Organizations , 2001, Int. J. High Perform. Comput. Appl..

[27]  D. Jacobsen,et al.  Contain This, Unleashing Docker for HPC , 2015 .

[28]  Rajkumar Buyya,et al.  Container‐based cluster orchestration systems: A taxonomy and future directions , 2018, Softw. Pract. Exp..

[29]  Ian T. Foster,et al.  Ripple: Home Automation for Research Data Management , 2017, 2017 IEEE 37th International Conference on Distributed Computing Systems Workshops (ICDCSW).

[30]  Vanessa Sochat,et al.  Singularity: Scientific containers for mobility of compute , 2017, PloS one.

[31]  Geoffrey C. Fox,et al.  Conceptualizing a Computing Platform for Science Beyond 2020: To Cloudify HPC, or HPCify Clouds? , 2017, 2017 IEEE 10th International Conference on Cloud Computing (CLOUD).

[32]  Vatche Ishakian,et al.  Serving Deep Learning Models in a Serverless Platform , 2017, 2018 IEEE International Conference on Cloud Engineering (IC2E).

[33]  Shantenu Jha,et al.  A Comprehensive Perspective on Pilot-Job Systems , 2015, ACM Comput. Surv..

[34]  Ian T. Foster,et al.  Efficient and Secure Transfer, Synchronization, and Sharing of Big Data , 2014, IEEE Cloud Computing.

[35]  R. M. Fano,et al.  The MAC system: the computer utility approach , 1965, IEEE Spectrum.

[36]  Henri Casanova,et al.  Overview of GridRPC: A Remote Procedure Call API for Grid Computing , 2002, GRID.

[37]  Tristan Glatard,et al.  A Serverless Tool for Platform Agnostic Computational Experiment Management , 2018, Front. Neuroinform..

[38]  Paarijaat Aditya,et al.  SAND: Towards High-Performance Serverless Computing , 2018, USENIX Annual Technical Conference.

[39]  Ian Foster,et al.  Parsl: Pervasive Parallel Programming in Python , 2019, HPDC.

[40]  Foster Ian,et al.  Globus auth: A research identity and access management platform , 2016 .

[41]  Song Han,et al.  Fast inference of deep neural networks in FPGAs for particle physics , 2018, Journal of Instrumentation.

[42]  Mengyuan Li,et al.  Peeking Behind the Curtains of Serverless Platforms , 2018, USENIX Annual Technical Conference.