An optimized workflow enactor for data-intensive grid applications

Data-intensive applications benefit from an intrinsic data parallelism that should be exploited on parallel systems to lower execution time. In the last years, data grids have been developed to handle, process, and analyze the tremendous amount of data produced in many scientific areas. Although very large, these grid infrastructures are under heavy use and efficiency is of utmost importance. This paper deals with the optimization of workflow managers used for deploying complex data-driven applications on grids. In that kind of application, we show how to better exploit data parallelism than currently done in most existing workflow managers. We present the design of a prototype implementing our solution and we show that it provides a significant speed-up w.r.t existing solutions by exemplifying results on a realistic medical imaging application.

[1]  Rajkumar Buyya,et al.  A Taxonomy of Workflow Management Systems for Grid Computing , 2005, Proceedings of the 38th Annual Hawaii International Conference on System Sciences.

[2]  Edward A. Lee,et al.  Scientific workflow management and the Kepler system , 2006, Concurr. Comput. Pract. Exp..

[3]  Jesús Labarta,et al.  Programming Grid Applications with GRID Superscalar , 2003, Journal of Grid Computing.

[4]  Matthew Shields,et al.  A Parallel Implementation of the Inspiral Search Algorithm using Triana , 2003 .

[5]  Luc Soler,et al.  Evaluation of a New 3D/2D Registration Criterion for Liver Radio-Frequencies Guided by Augmented Reality , 2003, IS4TH.

[6]  C. Barillot,et al.  Towards an Ontology for Sharing Neuroimaging Data and Processing Tools : Experience Learned from the Development of a Demonstrator , 2004 .

[7]  Carole A. Goble,et al.  Feta: A Light-Weight Architecture for User Oriented Semantic Service Discovery , 2005, ESWC.

[8]  V Breton,et al.  Partitioning Medical Image Databases for Content-based Queries on a Grid , 2005, Methods of Information in Medicine.

[9]  Carole A. Goble,et al.  Using Semantic Web Technologies for Representing E-science Provenance , 2004, SEMWEB.

[10]  Tony Andrews Business Process Execution Language for Web Services Version 1.1 , 2003 .

[11]  Johan Montagnat,et al.  Probabilistic and dynamic optimization of job partitioning on a grid infrastructure , 2006, 14th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP'06).

[12]  Matjaz B. Juric,et al.  Business process execution language for web services , 2004 .

[13]  T. Oinn,et al.  Soaplab - a unified Sesame door to analysis tools , 2003 .

[14]  Yong Zhao,et al.  Chimera: a virtual data system for representing, querying, and automating data derivation , 2002, Proceedings 14th International Conference on Scientific and Statistical Database Management.

[15]  Matthew R. Pocock,et al.  Taverna: a tool for the composition and enactment of bioinformatics workflows , 2004, Bioinform..

[16]  Adam Arbree,et al.  Mapping Abstract Complex Workflows onto Grid Environments , 2003, Journal of Grid Computing.

[17]  Jonathan D. Blower,et al.  Data streaming, workflow and firewall-friendly Grid Services with Styx , 2005 .

[18]  Douglas Thain,et al.  Distributed computing in practice: the Condor experience , 2005, Concurr. Pract. Exp..

[19]  Robert Stevens,et al.  Treating Shimantic Web Syndrome with Ontologies , 2004 .

[20]  Péter Kacsuk,et al.  P-GRADE: A Grid Programming Environment , 2003, Journal of Grid Computing.

[21]  Johan Montagnat,et al.  Grid-enabled workflows for data intensive medical applications , 2005, 18th IEEE Symposium on Computer-Based Medical Systems (CBMS'05).

[22]  Ian Taylor,et al.  Grid Enabling Applications Using Triana , 2003 .