Challenges in using scientific workflow tools in the hydrology domain

Scientific workflow tools are used to perform complex analysis on scientific data. The strength of scientific workflow tools lies in their ability to capture a complex analysis as a sequence of steps using simple components. The components may run on different computers or clusters located at different geographical locations and access data from heterogeneous sources. Scientific workflow tools are popular in specific scientific domains (e.g., ecology, genomics, and astrophysics) for the integration of simulation models. However, this kind of software framework has not been widely embraced by the hydrology domain. Typically hydrologists use a combination of off-the-shelf software systems such as The Invisible Modelling Environment (TIME) or Matlab along with other ad-hoc software to perform integration and to implement the workflow as their environment for hydrology simulations. In a large scale hydrology study, such as the Murray Darling Sustainable Yields undertaken by CSIRO, a substantial fraction of the overall cost is devoted to developing a collection of software tools to implement the processing flow required for the project. Scientific workflow tools offer significant potential to allow this software to be modularised, reused and shared. The process composition can be semi-automated. For an organisation such as the Bureau of Meteorology Water Division, which is responsible for the routine production of data products like a national water account, these tools can provide greatly improved transparency into the method and auditibility of the result. There are several challenges to the application of scientific workflow in hydrology domain. This paper explores some of these challenges: managing and processing large volumes of data, integration of heterogeneous data and model integration. Grid computing could be used as an interoperability platform to manage compute and data resources. Considerable work has been carried out in the grid computing community to develop methods to discover and efficiently access data by enabling services such as Fast Data Transfer, GridFTP and RapidFTP. We recommend the compute and data resources to be stored in a same grid infrastructure to provide effective execution. We also describe some applications that use grid computing to improve the execution speed of the hydrology models. Finally, we report on the adaptation of Kepler for hydrology domain. Kepler is a service-based workflow tool used extensively in some scientific domain (e.g. Ecology). We report on some preliminary work through modifications of existing actors to suit our needs and development of new actors that allow access to the Open Geospatial Consortium- Sensor Web Enablement web services.

[1]  David S. Ebert,et al.  Vision of Cyberinfrastructure for End-to-End Environmental Explorations "C4E4… , 2009 .

[2]  Geoffrey C. Fox,et al.  Examining the Challenges of Scientific Workflows , 2007, Computer.

[3]  Ian Foster,et al.  The Grid 2 - Blueprint for a New Computing Infrastructure, Second Edition , 1998, The Grid 2, 2nd Edition.

[4]  Michael Piasecki,et al.  ROLE OF ONTOLOGIES IN CREATING HYDROLOGIC METADATA , 2004 .

[5]  Ilkay Altintas,et al.  Provenance Collection Support in the Kepler Scientific Workflow System , 2006, IPAW.

[6]  David R. Maidment,et al.  The CUAHSI Hydrologic Information System , 2008 .

[7]  Peter Taylor,et al.  Hydrological Sensor Web for the South Esk Catchment in the Tasmanian state of Australia , 2008, 2008 IEEE Fourth International Conference on eScience.

[8]  Yolanda Gil,et al.  Provenance trails in the Wings-Pegasus system , 2008 .

[9]  Yogesh L. Simmhan,et al.  A survey of data provenance in e-science , 2005, SGMD.

[10]  Ian J. Taylor,et al.  Workflows and e-Science: An overview of workflow system features and capabilities , 2009, Future Gener. Comput. Syst..

[11]  Ian T. Foster,et al.  The data grid: Towards an architecture for the distributed management and analysis of large scientific datasets , 2000, J. Netw. Comput. Appl..

[12]  Mark Mulligan,et al.  Modelling catchment hydrology. , 2004 .

[13]  Ann L. Chervenak,et al.  Data Management Challenges of Data-Intensive Scientific Workflows , 2008, 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGRID).

[14]  Junwei Cao,et al.  A Case Study on the Use of Workflow Technologies for Scientific Analysis: Gravitational Wave Data Analysis , 2007, Workflows for e-Science, Scientific Workflows for Grids.

[15]  J. Montagnat,et al.  Data composition patterns in service-based workflows , 2006, 2006 Workshop on Workflows in Support of Large-Scale Science.

[16]  P. Dennis,et al.  Macroscale Hydrology : Challenges and Opportunities , 2001 .