Experiments with in-transit processing for data intensive grid workflows

Efficient and robust data streaming and in-transit data manipulations are critical requirements of emerging scientific and engineering application workflows, which are based on seamless interactions and coupling between geographically distributed application components. The overall goal of this research is to address these requirements and develop a data streaming and in-transit data manipulation service. In this paper, we experimentally investigate reactive management strategies for in-transit data manipulation, as well as cooperative end-to-end management for wide-area data-streaming and in-transit data manipulation for data-intensive scientific and engineering workflows.

[1]  Micah Beck,et al.  The Logistical Computing Stack - A Design For Wide-Area, Scalable, Uninterruptible Computing , 2002 .

[2]  Marianne Winslett,et al.  High-level buffering for hiding periodic output cost in scientific simulations , 2006, IEEE Transactions on Parallel and Distributed Systems.

[3]  Carole A. Goble,et al.  myGrid: personalised bioinformatics on the information grid , 2003, ISMB.

[4]  Ian T. Foster,et al.  The data grid: Towards an architecture for the distributed management and analysis of large scientific datasets , 2000, J. Netw. Comput. Appl..

[5]  Micah Beck,et al.  The Internet Backplane Protocol: Storage in the Network , 1999 .

[6]  Paul Avery,et al.  SPHINX: a fault-tolerant system for scheduling in dynamic grid environments , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[7]  Rajkumar Buyya,et al.  A Grid service broker for scheduling e‐Science applications on global data Grids , 2006, Concurr. Comput. Pract. Exp..

[8]  Arie Shoshani,et al.  Storage resource managers: Middleware components for gridstorage , 2005 .

[9]  Carl Kesselman,et al.  Optimizing Grid-Based Workflow Execution , 2005, Journal of Grid Computing.

[10]  Marius A. Eriksen,et al.  Trickle: A Userland Bandwidth Shaper for UNIX-like Systems , 2005, USENIX Annual Technical Conference, FREENIX Track.

[11]  Joel H. Saltz,et al.  Processing large-scale multi-dimensional data in parallel and distributed environments , 2002, Parallel Comput..

[12]  Manish Parashar,et al.  Accord: a programming system for autonomic self-managing applications , 2005 .

[13]  Kaizar Amin,et al.  GridAnt: a client-controllable grid workflow system , 2004, 37th Annual Hawaii International Conference on System Sciences, 2004. Proceedings of the.

[14]  Dean N. Williams,et al.  Science-Driven Network Requirements for ESnet , 2006 .

[15]  Karsten Schwan,et al.  IQ-Paths: Predictably High Performance Data Streams Across Dynamic Network Overlays , 2006, 2006 15th IEEE International Conference on High Performance Distributed Computing.

[16]  Scott Klasky,et al.  The Center for Plasma Edge Simulation Workflow Requirements , 2006, 22nd International Conference on Data Engineering Workshops (ICDEW'06).

[17]  Robert L. Grossman,et al.  SABUL: A Transport Protocol for Grid Computing , 2003, Journal of Grid Computing.

[18]  William E. Allcock,et al.  The Globus Striped GridFTP Framework and Server , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[19]  Nagarajan Kandasamy,et al.  A Self-Managing Wide-Area Data Streaming Service using Model-based Online Control , 2006, GRID.