Grid approaches to data-driven scientific and engineering workflows

Enabling the full life cycle of scientific and engineering workflows requires robust middleware and services that support near-realtime data movement, high-performance processing and effective data management. In this context, we consider two related technology areas: Grid computing which is fast emerging as an accepted way forward for the large-scale, distributed and multi-institutional resource sharing and Database systems whose capabilities are undergoing continuous change providing new possibilities for scientific data management in Grid. In this thesis, we look into the challenging requirements while integrating data-driven scientific and engineering experiment workflows onto Grid. We consider wind tunnels that house multiple experiments with differing characteristics, as an application exemplar. This thesis contributes two approaches while attempting to tackle some of the following questions: How to allow domain-specific workflow activity development by hiding the underlying complexity? Can new experiments be added to the system easily? How can the overall turnaround time be reduced by an end-to-end experimental workflow support? In the first approach, we show how experiment-specific workflows can help accelerate application development using Grid services. This has been realized with the development of MyCoG, the first Commodity Grid toolkit for .NET supporting multi-language programmability. In the second , we present an alternative approach based on federated database services to realize an end-to-end experimental workflow. We show with the help of a real-world example, how database services can be building blocks for scientific and engineering workflows.

[1]  Sharanya Eswaran,et al.  Adapting and Evaluating Commercial Workflow Engines for e-Science , 2006, 2006 Second IEEE International Conference on e-Science and Grid Computing (e-Science'06).

[2]  John Linn,et al.  Generic Security Service Application Program Interface , 1993, RFC.

[3]  Peter Z. Kunszt,et al.  The SDSS skyserver: public access to the sloan digital sky server data , 2001, SIGMOD '02.

[4]  Joel H. Saltz,et al.  Database Support for Data-Driven Scientific Applications in the Grid , 2003, Parallel Process. Lett..

[5]  Laura M. Haas,et al.  Data integration through database federation , 2002, IBM Syst. J..

[6]  John Shalf,et al.  SAGA: A Simple API for Grid Applications. High-level application programming on the Grid , 2006 .

[7]  Jeffrey Richter Applied Microsoft .NET Framework Programming , 2002 .

[8]  William E. Allcock,et al.  Reliable file transfer in Grid environments , 2002, 27th Annual IEEE Conference on Local Computer Networks, 2002. Proceedings. LCN 2002..

[9]  R. V. van Nieuwpoort,et al.  The Grid 2: Blueprint for a New Computing Infrastructure , 2003 .

[10]  Steven J. Johnston,et al.  Encouraging collaboration through a new data management approach , 2006 .

[11]  Kenji Takeda,et al.  Unsteady aerodynamics of flap cove flow in a high-lift device configuration , 2001 .

[12]  Nicolas Molin,et al.  Control of Noise Sources on Aircraft Landing Gear Bogies , 2006 .

[13]  Jim Gray,et al.  To BLOB or Not To BLOB: Large Object Storage in a Database or a Filesystem? , 2007, ArXiv.

[14]  Thomas Jackson,et al.  Predictive Maintenance , 2004, The Grid 2, 2nd Edition.

[15]  Yaron Goland,et al.  Web Services Business Process Execution Language , 2009, Encyclopedia of Database Systems.

[16]  Michael J. Franklin,et al.  The Design of GridDB: A Data-Centric Overlay for the Scientific Grid , 2004, VLDB.

[17]  Ian T. Foster,et al.  Globus: a Metacomputing Infrastructure Toolkit , 1997, Int. J. High Perform. Comput. Appl..

[18]  Ian T. Foster,et al.  The Anatomy of the Grid: Enabling Scalable Virtual Organizations , 2001, Int. J. High Perform. Comput. Appl..

[19]  G. Fox,et al.  Streaming Data Services to Support Archival and Real-Time Geographical Information System Grids , 2006 .

[20]  Warren Smith,et al.  Software infrastructure for the I-WAY high-performance distributed computing experiment , 1996, Proceedings of 5th IEEE International Symposium on High Performance Distributed Computing.

[21]  Hai Zhuge,et al.  Discovery of knowledge flow in science , 2006, CACM.

[22]  Kenji Takeda,et al.  MyGridFTP: A Zero-Deployment GridFTP Client Using the .NET Framework , 2005, EGC.

[23]  Madhusudhan Govindaraju,et al.  Investigating the limits of SOAP performance for scientific computing , 2002, Proceedings 11th IEEE International Symposium on High Performance Distributed Computing.

[24]  Liang Chen,et al.  Sedna: A BPEL-Based Environment for Visual Scientific Workflow Modeling , 2007, Workflows for e-Science, Scientific Workflows for Grids.

[25]  Simon J. Cox,et al.  Leveraging Windows Workflow Foundation for Scientific Workflows in Wind Tunnel Applications , 2006, 22nd International Conference on Data Engineering Workshops (ICDEW'06).

[26]  W. Keith Edwards,et al.  Core Jini , 1999 .

[27]  Rajkumar Buyya,et al.  A taxonomy of scientific workflow systems for grid computing , 2005, SGMD.

[28]  Ami Marowka,et al.  The GRID: Blueprint for a New Computing Infrastructure , 2000, Parallel Distributed Comput. Pract..

[29]  Simon J. Cox,et al.  Workflows for Wind Tunnel Grid Applications , 2006, International Conference on Computational Science.

[30]  Simon J. Cox,et al.  Federated database services for wind tunnel experiment workflows , 2006, Sci. Program..

[31]  Edward A. Lee,et al.  CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCE Concurrency Computat.: Pract. Exper. 2000; 00:1–7 Prepared using cpeauth.cls [Version: 2002/09/19 v2.02] Taverna: Lessons in creating , 2022 .

[32]  Aniruddha R. Thakar,et al.  When Database Systems Meet the Grid , 2005, CIDR.

[33]  Steven Tuecke,et al.  The Physiology of the Grid An Open Grid Services Architecture for Distributed Systems Integration , 2002 .

[34]  D. Hollingsworth The workflow Reference Model , 1994 .

[35]  Jun Fang,et al.  Hosting the .NET Runtime in Microsoft SQL server , 2004, SIGMOD '04.

[36]  Raghu Ramakrishnan,et al.  Database Management Systems , 1976 .

[37]  Hans H. Kron,et al.  Programming-in-the-Large Versus Programming-in-the-Small , 1975, IEEE Transactions on Software Engineering.

[38]  Keith R. Jackson pyGlobus: a Python interface to the Globus Toolkit™ , 2002, Concurr. Comput. Pract. Exp..

[39]  Jim Gray The Revolution in Database Architecture , 2004 .

[40]  Alexander S. Szalay,et al.  Petascale Computational Systems: Balanced CyberInfrastructure in a Data-Centric World , 2006 .

[41]  Scott Short Building XML Web Services for the Microsoft .Net Platform , 2002 .

[42]  David Robert Michael Jeffrey,et al.  An investigation into the aerodynamics of Gurney flaps , 1998 .

[43]  Intel Corportation,et al.  IA-32 Intel Architecture Software Developers Manual , 2004 .

[44]  Mark Williams Pro .NET Oracle Programming , 2004 .

[45]  James Conard,et al.  Presenting Windows Workflow Foundation , 2005 .

[46]  Aleksander Slominski Adapting BPEL to Scientific Workflows , 2007, Workflows for e-Science, Scientific Workflows for Grids.

[47]  Yolanda Gil,et al.  Workflow management in GriPhyN , 2004 .

[48]  Norman W. Paton,et al.  The WS-DAI family of specifications for web service data access and integration , 2006, SGMD.

[49]  Anthony Rowe,et al.  IT service infrastructure for integrative systems biology , 2004, IEEE International Conference onServices Computing, 2004. (SCC 2004). Proceedings. 2004.

[50]  Gregor von Laszewski,et al.  A Java commodity grid kit , 2001, Concurr. Comput. Pract. Exp..

[51]  Gregor von Laszewski,et al.  The Perl Commodity Grid Toolkit , 2002, Concurr. Comput. Pract. Exp..

[52]  Roger Wolter The Rational Guide to SQL Server 2005 Service Broker (Rational Guides) (Rational Guides) , 2006 .

[53]  Jason Maassen,et al.  Programming Scientific and Distributed Workflow with Triana Services , 2004 .

[54]  B. F. Spencer,et al.  Distributed hybrid earthquake engineering experiments: experiences with a ground-shaking grid application , 2004, Proceedings. 13th IEEE International Symposium on High performance Distributed Computing, 2004..

[55]  Kenji Takeda,et al.  MyCoG.NET: towards a multi-language CoG toolkit , 2005, MGC '05.

[56]  Achim Streit,et al.  Unicore - From project results to production grids , 2005, High Performance Computing Workshop.

[57]  Norman W. Paton,et al.  The design and implementation of Grid database services in OGSA‐DAI , 2005, Concurr. Pract. Exp..

[58]  Edward A. Lee,et al.  Scientific workflow management and the Kepler system , 2006, Concurr. Comput. Pract. Exp..

[59]  Wei Zhang,et al.  Benchmarking XML Processors for Applications in Grid Web Services , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[60]  Kenji Takeda,et al.  Unsteady aerodynamics and aeroacoustics of a high-lift device configuration , 2002 .

[61]  Matthew S. Shields Control- Versus Data-Driven Workflows , 2007, Workflows for e-Science, Scientific Workflows for Grids.

[62]  Steven Tuecke,et al.  GridFTP: Protocol Extensions to FTP for the Grid , 2001 .

[63]  Xuejia Lai,et al.  Authentication and Authorization in the IN , 1994, Workshop on Intelligent Network.

[64]  Simon J. Cox,et al.  MyCoG.NET: a multi‐language CoG toolkit , 2007, Concurr. Comput. Pract. Exp..

[65]  Michael Rys XML and relational database management systems: inside Microsoft® SQL Server™ 2005 , 2005, SIGMOD '05.

[66]  Paul Roe,et al.  Bio-workflows with BizTalk: using a commercial workflow engine for eScience , 2005, First International Conference on e-Science and Grid Computing (e-Science'05).

[67]  Munindar P. Singh,et al.  Protocols for processes: programming in the large for open systems , 2004, SIGP.

[68]  Aleksander Slominski,et al.  Web Services Invocation Framework (WSIF) , 2001 .

[69]  Dennis Gannon,et al.  Active management of scientific data , 2005, IEEE Internet Computing.

[70]  Heinz Stockinger Distributed Database Management Systems and the Data Grid , 2001, 2001 Eighteenth IEEE Symposium on Mass Storage Systems and Technologies.

[71]  Sangmi Lee Pallickara,et al.  Structure, sharing and preservation of scientific experiment data , 2005, CLADE 2005. Proceedings Challenges of Large Applications in Distributed Environments, 2005..

[72]  Mario Cannataro,et al.  The knowledge grid , 2003, CACM.

[73]  Dennis Gannon,et al.  Scientific versus Business Workflows , 2007, Workflows for e-Science, Scientific Workflows for Grids.

[74]  Vikas Arora,et al.  Native Xquery processing in oracle XMLDB , 2005, SIGMOD '05.

[75]  A. D. Meglio,et al.  Programming the Grid with gLite , 2006 .

[76]  Reagan Moore,et al.  A simple mass storage system for the SRB data grid , 2003, 20th IEEE/11th NASA Goddard Conference on Mass Storage Systems and Technologies, 2003. (MSST 2003). Proceedings..