New Execution Paradigm for Data-Intensive Scientific Workflows

With the advent of Grid and service-oriented technologies, scientific workflows have been introduced in response to the increasing demand of researchers for assembling diverse, highly-specialized applications, allowing them to exchange large heterogeneous datasets in order to accomplish a complex scientific task. Much research has already been done to provide efficient scientific workflow management systems (WfMS). However, most of such WfMS are coordinating and executing workflows in a centralized fashion. This creates a single point of failure, forms a scalability bottleneck, and often leads to excessive traffic routed back to the coordinator. Additionally, none of the available WfMS provides means for dynamic data transformation between services in order to overcome the data heterogeneity problem. This work presents a new approach for scientific workflow management targeted to provide ways for an efficient distributed execution of data-intensive workflows. The proposed approach reduces the communication traffic between services and overcomes the data heterogeneity problem. Moreover, it allows full control over long-running applications, as well as provides support for smart re-run, distributed fault handling and distributed load balancing.

[1]  Edward A. Lee,et al.  CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCE Concurrency Computat.: Pract. Exper. 2000; 00:1–7 Prepared using cpeauth.cls [Version: 2002/09/19 v2.02] Taverna: Lessons in creating , 2022 .

[2]  Ian T. Foster,et al.  Condor-G: A Computation Management Agent for Multi-Institutional Grids , 2004, Cluster Computing.

[3]  Armin B. Cremers,et al.  Distributed Scientific Workflow Management for Data-Intensive Applications , 2008, 2008 12th IEEE International Workshop on Future Trends of Distributed Computing Systems.

[4]  Edward A. Lee,et al.  Scientific workflow management and the Kepler system , 2006, Concurr. Comput. Pract. Exp..

[5]  Gregor von Laszewski,et al.  Swift: Fast, Reliable, Loosely Coupled Parallel Computation , 2007, 2007 IEEE Congress on Services (Services 2007).

[6]  Gregor von Laszewski,et al.  Java CoG Kit Workflow , 2007, Workflows for e-Science, Scientific Workflows for Grids.

[7]  Daniel S. Katz,et al.  Pegasus: A framework for mapping complex scientific workflows onto distributed systems , 2005, Sci. Program..

[8]  Ronald L. Rivest,et al.  The MD5 Message-Digest Algorithm , 1992, RFC.

[9]  Yong Zhao,et al.  Falkon: a Fast and Light-weight tasK executiON framework , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[10]  Armin B. Cremers,et al.  First Steps Towards an Integrated Decision Support System for Water Management , 2006, EnviroInfo.

[11]  David Kendrick,et al.  GAMS, a user's guide , 1988, SGNM.