funcX: A Federated Function Serving Fabric for Science

Exploding data volumes and velocities, new computational methods and platforms, and ubiquitous connectivity demand new approaches to computation in the sciences. These new approaches must enable computation to be mobile, so that, for example, it can occur near data, be triggered by events (e.g., arrival of new data), be offloaded to specialized accelerators, or run remotely where resources are available. They also require new design approaches in which monolithic applications can be decomposed into smaller components, that may in turn be executed separately and on the most suitable resources. To address these needs we present funcX---a distributed function as a service (FaaS) platform that enables flexible, scalable, and high performance remote function execution. funcX's endpoint software can transform existing clouds, clusters, and supercomputers into function serving systems, while funcX's cloud-hosted service provides transparent, secure, and reliable function execution across a federated ecosystem of endpoints. We motivate the need for funcX with several scientific case studies, present our prototype design and implementation, show optimizations that deliver throughput in excess of 1 million functions per second, and demonstrate, via experiments on two supercomputers, that funcX can scale to more than more than 130 000 concurrent workers.

[1]  Rajkumar Buyya,et al.  A taxonomy and survey of grid resource management systems for distributed computing , 2002, Softw. Pract. Exp..

[2]  Rajkumar Buyya,et al.  Container‐based cluster orchestration systems: A taxonomy and future directions , 2018, Softw. Pract. Exp..

[3]  Brendan Burns,et al.  Kubernetes: Up and Running: Dive into the Future of Infrastructure , 2017 .

[4]  Ian Foster,et al.  Serverless Workflows for Indexing Large Scientific Data , 2019, WOSC@Middleware.

[5]  Maciej Malawski,et al.  Towards Serverless Execution of Scientific Workflows - HyperFlow Case Study , 2016, WORKS@SC.

[6]  Perry Cheng,et al.  Serverless Computing: Current Trends and Open Problems , 2017, Research Advances in Cloud Computing.

[7]  Song Han,et al.  Fast inference of deep neural networks in FPGAs for particle physics , 2018, Journal of Instrumentation.

[8]  Dirk Merkel,et al.  Docker: lightweight Linux containers for consistent development and deployment , 2014 .

[9]  Geoffrey C. Fox,et al.  Conceptualizing a Computing Platform for Science Beyond 2020: To Cloudify HPC, or HPCify Clouds? , 2017, 2017 IEEE 10th International Conference on Cloud Computing (CLOUD).

[10]  Ian T. Foster,et al.  Toward scalable monitoring on large-scale storage for software defined cyberinfrastructure , 2017, PDSW-DISCS@SC.

[11]  D. Jacobsen,et al.  Contain This, Unleashing Docker for HPC , 2015 .

[12]  Mengyuan Li,et al.  Peeking Behind the Curtains of Serverless Platforms , 2018, USENIX Annual Technical Conference.

[13]  Douglas Thain,et al.  Distributed computing in practice: the Condor experience , 2005, Concurr. Pract. Exp..

[14]  Geoffrey C. Fox,et al.  Status of Serverless Computing and Function-as-a-Service(FaaS) in Industry and Research , 2017, ArXiv.

[15]  Tim Head,et al.  Reproducible Research Environments with Repo2Docker , 2018 .

[16]  Josef Spillner,et al.  FaaSter, Better, Cheaper: The Prospect of Serverless Scientific Computing and HPC , 2017, CARLA.

[17]  Blesson Varghese,et al.  Cloud Futurology , 2019, Computer.

[18]  Perry Cheng,et al.  Cloud-Native, Event-Based Programming for Mobile Applications , 2016, 2016 IEEE/ACM International Conference on Mobile Software Engineering and Systems (MOBILESoft).

[19]  R. M. Fano,et al.  The MAC system: the computer utility approach , 1965, IEEE Spectrum.

[20]  Matthew W. Vaughn,et al.  Containers-as-a-service via the Actor Model , 2017 .

[21]  Nicholas K. Sauter,et al.  The DIALS framework for integration software , 2013 .

[22]  Zhe Zhang,et al.  Fast access to columnar, hierarchically nested data via code transformation , 2017, 2017 IEEE International Conference on Big Data (Big Data).

[23]  Kevin Pedro,et al.  Coffea - Columnar Object Framework For Effective Analysis , 2019, EPJ Web of Conferences.

[24]  Ian T. Foster,et al.  Globus auth: A research identity and access management platform , 2016, 2016 IEEE 12th International Conference on e-Science (e-Science).

[25]  Ian T. Foster,et al.  DLHub: Model and Data Serving for Science , 2018, 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[26]  Florian Schintke,et al.  Peer-to-Peer Computing , 2010, Euro-Par.

[27]  Henri Casanova,et al.  Overview of GridRPC: A Remote Procedure Call API for Grid Computing , 2002, GRID.

[28]  Randy H. Katz,et al.  Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center , 2011, NSDI.

[29]  Reid Priedhorsky,et al.  Charliecloud: Unprivileged Containers for User-Defined Software Stacks in HPC , 2017, SC17: International Conference for High Performance Computing, Networking, Storage and Analysis.

[30]  Tristan Glatard,et al.  A Serverless Tool for Platform Agnostic Computational Experiment Management , 2018, Front. Neuroinform..

[31]  Paarijaat Aditya,et al.  SAND: Towards High-Performance Serverless Computing , 2018, USENIX Annual Technical Conference.

[32]  Shantenu Jha,et al.  A Comprehensive Perspective on Pilot-Job Systems , 2015, ACM Comput. Surv..

[33]  Ian T. Foster,et al.  Efficient and Secure Transfer, Synchronization, and Sharing of Big Data , 2014, IEEE Cloud Computing.

[34]  J. R. King,et al.  The Challenge of the Computer Utility , 1967 .

[35]  Vanessa Sochat,et al.  Singularity: Scientific containers for mobility of compute , 2017, PloS one.

[36]  LivnyMiron,et al.  Distributed computing in practice: the Condor experience , 2005 .

[37]  Ian T. Foster,et al.  The anatomy of the grid: enabling scalable virtual organizations , 2001, Proceedings First IEEE/ACM International Symposium on Cluster Computing and the Grid.

[38]  Ian Foster,et al.  Parsl: Pervasive Parallel Programming in Python , 2019, HPDC.