An End-to-end Workflow Pipeline for Large-scale Grid Computing

In this paper we describe a service-based, software architecture that enables end-to-end, high-level workflow processing in a Grid environment consisting of many heterogeneous resources. Our architecture is essentially a pipeline that extends from the abstract application specification phase to the deployment and execution stages through to returning the results to the user. We envision a large-scale Grid environment that contains heterogeneous resources. Our architecture caters for flexible deployment, performance, reliability and charging for resource usage. These are addressed at the specification level as well as at the realisation (brokering) and execution levels. The proposed architecture is derived from previous work in LeSC that has produced the ICENI pipeline, and our experience with e-Science projects, such as GENIE, e-Protein and RealityGrid from which we derive a set of key requirements.

[1]  Carole A. Goble,et al.  Feta: A Light-Weight Architecture for User Oriented Semantic Service Discovery , 2005, ESWC.

[2]  Rajkumar Buyya,et al.  A Taxonomy of Workflow Management Systems for Grid Computing , 2005, Proceedings of the 38th Annual Hawaii International Conference on System Sciences.

[3]  Dennis Gannon,et al.  Java RMI performance and object model interoperability: experiments with Java/HPC++ , 1998 .

[4]  Kevin Barraclough,et al.  I and i , 2001, BMJ : British Medical Journal.

[5]  Carole A. Goble,et al.  Exploring Williams-Beuren syndrome using myGrid , 2004, ISMB/ECCB.

[6]  Ian T. Foster,et al.  From sandbox to playground: dynamic virtual environments in the grid , 2004, Fifth IEEE/ACM International Workshop on Grid Computing.

[7]  John Darlington,et al.  On Advanced Scientific Understanding , Model Componentisation and Coupling in GENIE , 2005 .

[8]  Sheng Liang,et al.  Java Native Interface: Programmer's Guide and Specification , 1999 .

[9]  John Darlington,et al.  A Semantic Similarity Measure for Semantic Web Services , 2005 .

[10]  Diego Calvanese,et al.  The Description Logic Handbook: Theory, Implementation, and Applications , 2003, Description Logic Handbook.

[11]  Simon J. Cox,et al.  Empowering Resource Providers to Build the Semantic Grid , 2004, IEEE/WIC/ACM International Conference on Web Intelligence (WI'04).

[12]  Anthony Skjellum,et al.  A High-Performance, Portable Implementation of the MPI Message Passing Interface Standard , 1996, Parallel Comput..

[13]  Julian Padget,et al.  Agent-based matchmaking of mathematical web services , 2005, AAMAS '05.

[14]  A. Stephen McGough,et al.  Meaning and Behaviour in Grid Oriented Components , 2002, GRID.

[15]  P. Coveney,et al.  Steering in computational science: Mesoscale modelling and simulation , 2003, physics/0307061.

[16]  Anthony Edward Mayer,et al.  Composite construction of high performance scientific applications , 2002 .

[17]  Volker Haarslev,et al.  RACER System Description , 2001, IJCAR.

[18]  Bertram Ludäscher,et al.  Kepler: an extensible system for design and execution of scientific workflows , 2004, Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004..

[19]  Peter G. Harrison,et al.  Parallel Programming Using Skeleton Functions , 1993, PARLE.

[20]  Sandra Fillebrown,et al.  The MathWorks' MATLAB , 1996 .

[21]  Ian Taylor,et al.  Grid Enabling Applications Using Triana , 2003 .

[22]  John Darlington,et al.  ICENI Virtual Organisation Management , 2003 .

[23]  Dennis Gannon,et al.  CAT: a high performance, distributed component architecture toolkit for the grid , 1999, Proceedings. The Eighth International Symposium on High Performance Distributed Computing (Cat. No.99TH8469).

[24]  Jeff Dike,et al.  User-mode Linux , 2006, Annual Linux Showcase & Conference.

[25]  Andrew Warfield,et al.  Xen and the art of virtualization , 2003, SOSP '03.

[26]  John Darlington,et al.  Mapping of Scientific Workflow within the e-Protein project to Distributed Resources , 2004 .

[27]  Ian Horrocks,et al.  OWL Web Ontology Language Reference-W3C Recommen-dation , 2004 .

[28]  Jem Treadwell,et al.  Open Grid Services Architecture , 2006, Grid-Based Problem Solving Environments.

[29]  Paul Anderson,et al.  Dynamic reconfiguration for grid fabrics , 2004, Fifth IEEE/ACM International Workshop on Grid Computing.

[30]  Anura Gurugé,et al.  Universal Description, Discovery, and Integration , 2004 .

[31]  John Darlington,et al.  Scheduling Componentised Applications On A Computational Grid , 2004 .

[32]  Harvey B. Newman,et al.  Global Platform for Rich Media Conferencing and Collaboration , 2003, ArXiv.

[33]  Ali Afzal,et al.  ICENI: An Integrated Grid Middleware to Support E-Science , 2005 .

[34]  Peter V. Coveney,et al.  The TeraGyroid project -- collaborative steering and visualization in an HPC grid for modelling complex fluids , 2004 .

[35]  Russ Housley,et al.  Internet X.509 Public Key Infrastructure Certificate and CRL Profile , 1999, RFC.

[36]  Scott R. Kohn,et al.  Toward a Common Component Architecture for High-Performance Scientific Computing , 1999, HPDC.

[37]  Simon J. Cox,et al.  Implementation of a Grid-Enabled Problem Solving Environment in Matlab , 2003, International Conference on Computational Science.

[38]  Ian T. Foster,et al.  Resource co-allocation in computational grids , 1999, Proceedings. The Eighth International Symposium on High Performance Distributed Computing (Cat. No.99TH8469).

[39]  Ali Afzal,et al.  Workflow Enactment in ICENI , 2004 .

[40]  Simon J. Cox,et al.  Implementation and utilisation of a Grid-enabled problem solving environment in Matlab , 2005, Future Gener. Comput. Syst..

[41]  Diego Calvanese,et al.  The Description Logic Handbook , 2007 .

[42]  Peter F. Patel-Schneider,et al.  DLP System Description , 1998, Description Logics.

[43]  Ian Horrocks,et al.  Using an Expressive Description Logic: FaCT or Fiction? , 1998, KR.

[44]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[45]  Achim Streit,et al.  Scheduling in HPC Resource Management Systems: Queuing vs. Planning , 2003, JSSPP.

[46]  Simon J. Cox,et al.  Performance guided scheduling in GENIE through ICENI , 2004 .

[47]  Tore Risch,et al.  EDUTELLA: a P2P networking infrastructure based on RDF , 2002, WWW.

[48]  Jonathan Chin,et al.  Lattice Boltzmann simulation of the flow of binary immiscible fluids with different viscosities using the Shan-Chen microscopic interaction model , 2002, Philosophical Transactions of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences.

[49]  Robert Richards,et al.  Universal Description, Discovery, and Integration (UDDI) , 2006 .

[50]  L. Stein,et al.  OWL Web Ontology Language - Reference , 2004 .

[51]  Bertram Ludäscher,et al.  Kepler: an extensible system for design and execution of scientific workflows , 2004 .

[52]  Julian Padget,et al.  Matchmaking Support for Mathematical Web Services , 2005 .

[53]  Ali Afzal,et al.  Making the Grid Predictable through Reservations and Performance Modelling , 2005, Comput. J..

[54]  Francine Berman,et al.  A Decoupled Scheduling Approach for the GrADS Program Development Environment , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[55]  Jerry R. Hobbs,et al.  DAML-S: Semantic Markup for Web Services , 2001, SWWS.