Eliminating the middleman: peer-to-peer dataflow

Efficiently executing large-scale, data-intensive workflows such as Montage must take into account the volume and pattern of communication. When orchestrating data-centric workflows, centralised servers common to standard workflow systems can become a bottleneck to performance. However, standards-based workflow systems that rely on centralisation, e.g., Web service based frameworks, have many other benefits such as a wide user base and sustained support. This paper presents and evaluates a light-weight hybrid architecture which maintains the robustness and simplicity of centralised orchestration, but facilitates choreography by allowing services to exchange data directly with one another. Furthermore our architecture is standards compliment, flexible and is a non-disruptive solution; service definitions do not have to be altered prior to enactment. Our architecture could be realised within any existing workflow framework, in this paper, we focus on a Web service based framework. Taking inspiration from Montage, a number of common workflow patterns (sequence, fan-in and fan-out), input-output data size relationships and network configurations are identified and evaluated. The performance analysis concludes that a substantial reduction in communication overhead results in a 2-4 fold performance benefit across all patterns. An end-to-end pattern through the Montage workflow results in an 8 fold performance benefit and demonstrates how the advantage of using our hybrid architecture increases as the complexity of a workflow grows.

[1]  Kincho H. Law,et al.  Analysis of integration models for service composition , 2002, WOSP '02.

[2]  D. Katz,et al.  The Montage architecture for grid-enabled science processing of large, distributed datasets , 2004 .

[3]  Gregor von Laszewski,et al.  GSFL: A Workflow Framework for Grid Services , 2002 .

[4]  Yaron Goland,et al.  Web Services Business Process Execution Language , 2009, Encyclopedia of Database Systems.

[5]  Liang Chen,et al.  Sedna: A BPEL-Based Environment for Visual Scientific Workflow Modeling , 2007, Workflows for e-Science, Scientific Workflows for Grids.

[6]  Boi Faltings,et al.  Decentralized Orchestration of CompositeWeb Services , 2006, 2006 IEEE International Conference on Web Services (ICWS'06).

[7]  Mike Jackson,et al.  Introduction to OGSA-DAI Services , 2004, SAG.

[8]  Andrew S. Grimshaw,et al.  Portable run-time support for dynamic object-oriented parallel processing , 1996, TOCS.

[9]  Edward A. Lee,et al.  Scientific workflow management and the Kepler system , 2006, Concurr. Comput. Pract. Exp..

[10]  Ian J. Taylor,et al.  Distributed P2P computing within Triana: a galaxy visualization test case , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[11]  Francisco Curbera,et al.  Web Services Business Process Execution Language Version 2.0 , 2007 .

[12]  Daniel S. Katz,et al.  Pegasus: A framework for mapping complex scientific workflows onto distributed systems , 2005, Sci. Program..

[13]  Jano I. van Hemert,et al.  Orchestrating Data-Centric Workflows , 2008, 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGRID).

[14]  David Liu Data-flow Distribution in FICAS Service Composition Infrastructure , 2002 .

[15]  Robert Stevens,et al.  The Origin and History of in silico Experiments , 2004 .