Serverless execution of scientific workflows: Experiments with HyperFlow, AWS Lambda and Google Cloud Functions

Abstract Scientific workflows consisting of a high number of interdependent tasks represent an important class of complex scientific applications. Recently, a new type of serverless infrastructures has emerged, represented by such services as Google Cloud Functions and AWS Lambda, also referred to as the Function-as-a-Service model. In this paper we take a look at such serverless infrastructures, which are designed mainly for processing background tasks of Web and Internet of Things applications, or event-driven stream processing. We evaluate their applicability to more compute- and data-intensive scientific workflows and discuss possible ways to repurpose serverless architectures for execution of scientific workflows. We have developed prototype workflow executor functions using AWS Lambda and Google Cloud Functions, coupled with the HyperFlow workflow engine. These functions can run workflow tasks in AWS and Google infrastructures, and feature such capabilities as data staging to/from S3 or Google Cloud Storage and execution of custom application binaries. We have successfully deployed and executed the Montage astronomy workflow, often used as a benchmark, and we report on initial results of its performance evaluation. Our findings indicate that the simple mode of operation makes this approach easy to use, although there are costs involved in preparing portable application binaries for execution in a remote environment. While our solution is an early prototype, we find the presented approach highly promising. We also discuss possible future steps related to execution of scientific workflows in serverless infrastructures. Finally, we perform a cost analysis and discuss implications with regard to resource management for scientific applications in general.

[1]  Marty Humphrey,et al.  Auto-scaling to minimize cost and meet application deadlines in cloud workflows , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[2]  Stephen Ennis,et al.  Cloud Event Programming Paradigms: Applications and Analysis , 2016, 2016 IEEE 9th International Conference on Cloud Computing (CLOUD).

[3]  Daniel S. Katz,et al.  Montage: a grid portal and software toolkit for science-grade astronomical image mosaicking , 2009, Int. J. Comput. Sci. Eng..

[4]  Rubby Casallas,et al.  Infrastructure Cost Comparison of Running Web Applications in the Cloud Using AWS Lambda and Monolithic and Microservice Architectures , 2016, 2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid).

[5]  Miron Livny,et al.  Pegasus, a workflow management system for science automation , 2015, Future Gener. Comput. Syst..

[6]  Dennis Gannon Component Architectures and Services: From Application Construction to Scientific Workflows , 2007, Workflows for e-Science, Scientific Workflows for Grids.

[7]  Bartosz Balis,et al.  HyperFlow: A model of computation, programming approach and enactment engine for complex distributed workflows , 2016, Future Gener. Comput. Syst..

[8]  G. Bruce Berriman,et al.  Data Sharing Options for Scientific Workflows on Amazon EC2 , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[9]  Radu Prodan,et al.  Evaluating High-Performance Computing on Google App Engine , 2012, IEEE Software.

[10]  Joe Weinman,et al.  Hybrid Cloud Economics , 2016, IEEE Cloud Computing.

[11]  Bartosz Balis,et al.  Porting HPC applications to the cloud: A multi-frontal solver case study , 2017, J. Comput. Sci..

[12]  Jarek Nabrzyski,et al.  Algorithms for cost- and deadline-constrained provisioning for scientific workflow ensembles in IaaS clouds , 2015 .

[13]  Bartosz Balis,et al.  A Lightweight Approach for Deployment of Scientific Workflows in Cloud Infrastructures , 2015, PPAM.

[14]  Marian Bubak,et al.  How to Use Google App Engine for Free Computing , 2013, IEEE Internet Computing.

[15]  Douglas Thain,et al.  Distributed computing in practice: the Condor experience , 2005, Concurr. Pract. Exp..

[16]  Henri Casanova,et al.  Overview of GridRPC: A Remote Procedure Call API for Grid Computing , 2002, GRID.

[17]  Philipp Leitner,et al.  Bursting with Possibilities -- An Empirical Study of Credit-Based Bursting Cloud Instance Types , 2015, 2015 IEEE/ACM 8th International Conference on Utility and Cloud Computing (UCC).

[18]  Carole A. Goble,et al.  Why workflows break — Understanding and combating decay in Taverna workflows , 2012, 2012 IEEE 8th International Conference on E-Science.

[19]  Carole A. Goble,et al.  Taverna, Reloaded , 2010, SSDBM.