Workflows Community Summit: Bringing the Scientific Workflows Community Together

Scientific workflows have been used almost universally across scientific domains, and have underpinned some of the most significant discoveries of the past several decades. Many of these workflows have high computational, storage, and/or communication demands, and thus must execute on a wide range of large-scale platforms, from large clouds to upcoming exascale high-performance computing (HPC) platforms. These executions must be managed using some software infrastructure. Due to the popularity of workflows, workflow management systems (WMSs) have been developed to provide abstractions for creating and executing workflows conveniently, efficiently, and portably. While these efforts are all worthwhile, there are now hundreds of independent WMSs, many of which are moribund. As a result, the WMS landscape is segmented and presents significant barriers to entry due to the hundreds of seemingly comparable, yet incompatible, systems that exist. As a result, many teams, small and large, still elect to build their own custom workflow solution rather than adopt, or build upon, existing WMSs. This current state of the WMS landscape negatively impacts workflow users, developers, and researchers. The"Workflows Community Summit"was held online on January 13, 2021. The overarching goal of the summit was to develop a view of the state of the art and identify crucial research challenges in the workflow community. Prior to the summit, a survey sent to stakeholders in the workflow community (including both developers of WMSs and users of workflows) helped to identify key challenges in this community that were translated into 6 broad themes for the summit, each of them being the object of a focused discussion led by a volunteer member of the community. This report documents and organizes the wealth of information provided by the participants before, during, and after the summit.

[1]  Takuya Akiba,et al.  Optuna: A Next-generation Hyperparameter Optimization Framework , 2019, KDD.

[2]  Paul F. Dubois Software Carpentry , 2006, Computing in Science & Engineering.

[3]  Henri Casanova,et al.  Teaching Parallel and Distributed Computing Concepts in Simulation with WRENCH , 2019, 2019 IEEE/ACM Workshop on Education for High-Performance Computing (EduHPC).

[4]  David D. Cox,et al.  Hyperopt: A Python Library for Optimizing the Hyperparameters of Machine Learning Algorithms , 2013, SciPy.

[5]  Hyojin Kim,et al.  LBANN: livermore big artificial neural network HPC toolkit , 2015, MLHPC@SC.

[6]  Erik Schultes,et al.  The FAIR Guiding Principles for scientific data management and stewardship , 2016, Scientific Data.

[7]  Yolanda Gil,et al.  FAIR Computational Workflows , 2020, Data Intelligence.

[8]  Carmine Spagnuolo,et al.  From desktop to Large-Scale Model Exploration with Swift/T , 2016, 2016 Winter Simulation Conference (WSC).

[9]  Grigori Fursin,et al.  Collective Knowledge: organizing research projects as a database of reusable components and portable workflows with common APIs , 2020, ArXiv.

[10]  Wil M. P. van der Aalst,et al.  Workflow Patterns , 2003, Distributed and Parallel Databases.

[11]  Fangfang Xia,et al.  CANDLE/Supervisor: a workflow framework for machine learning applied to cancer research , 2018, BMC Bioinformatics.

[12]  Rosa M. Badia,et al.  A common workflow registry of compute endpoints and applications , 2020 .

[13]  Rizos Sakellariou,et al.  A characterization of workflow management systems for extreme-scale applications , 2016, Future Gener. Comput. Syst..

[14]  Daniel Garijo,et al.  Nine Best Practices for Research Software Registries and Repositories: A Concise Guide , 2020, ArXiv.

[15]  Lavanya Ramakrishnan,et al.  The future of scientific workflows , 2018, Int. J. High Perform. Comput. Appl..