Datagridflows: Managing Long-Run Processes on Datagrids

This paper is an introduction to Datagridflows. Until recently, datagrids were generally considered over-hyped and the associated technologies not widely embraced in the academic community. Today, datagrids have become a reality and an important technology for managing large, unstructured data and storage resources distributed over autonomous administrative domains. The datagrids that are operating in production provide us an idea of new requirements and challenges that will be faced in future datagrid environments. One such requirement is the coordinated execution of long-run data management processes in datagrids. We term these processes as “datagridflows”. This new area provides exciting opportunities and challenges to researchers in distributed computing and distributed databases. This paper is intended to introduce these challenges to other researchers, including those new to grid computing. We provide motivation through discussion of datagridflow requirements and real production scenarios. We introduce current work on datagridflow technologies including the Datagrid Language (DGL) for describing datagridflows in datagrids.

[1]  Edward A. Lee,et al.  Scientific workflow management and the Kepler system , 2006, Concurr. Comput. Pract. Exp..

[2]  Yong Zhao,et al.  Chimera: a virtual data system for representing, querying, and automating data derivation , 2002, Proceedings 14th International Conference on Scientific and Statistical Database Management.

[3]  Arun Jagatheesan,et al.  Data grid management systems , 2003, SIGMOD '03.

[4]  Arun Jagatheesan,et al.  Real Experiences with Data Grids - Case studies in using the SRB , 2002 .

[5]  Yolanda Gil,et al.  Pegasus: Mapping Scientific Workflows onto the Grid , 2004, European Across Grids Conference.

[6]  Arun Jagatheesan,et al.  Gridflow description, query, and execution at SCEC using the SDSC matrix , 2004, Proceedings. 13th IEEE International Symposium on High performance Distributed Computing, 2004..

[7]  Kaizar Amin,et al.  GridAnt: A Grid Workflow System , 2003 .

[8]  Arun Jagatheesan,et al.  Virtual services in data grids , 2002, Proceedings 11th IEEE International Symposium on High Performance Distributed Computing.

[9]  Arun Jagatheesan,et al.  Data grid and gridflow management systems , 2004 .