A framework for self-optimizing, fault-tolerant, high performance bulk data transfers in a heterogeneous grid environment

The drastic increase in the data requirements of scientific applications combined with an increasing trend towards collaborative research has resulted in the need to transfer large amounts of data among the participating sites. The general approach to transferring such large amounts of data has been to either dump data to tapes and mail them or employ scripts with an operator at each site to babysit the transfers to deal with failures. We introduce a framework which automates the whole process of data movement between different sites. The framework does not require any human intervention and it can recover automatically from various kinds of storage system, network, and software failures, guaranteeing completion of the transfers. The framework has sophisticated monitoring and tuning capability that increases the performance of the data transfers on the fly. The framework also generates on-the-fly visualization of the transfers making identification of problems and bottlenecks in the system simple.

[1]  Ian T. Foster,et al.  Secure, Efficient Data Transport and Replica Management for High-Performance Data-Intensive Computing , 2001, 2001 Eighteenth IEEE Symposium on Mass Storage Systems and Technologies.

[2]  Caltech,et al.  The Palomar Digital Sky Survey (DPOSS) , 1998, astro-ph/9809187.

[3]  Miron Livny,et al.  Condor-a hunter of idle workstations , 1988, [1988] Proceedings. The 8th International Conference on Distributed.

[4]  Ian T. Foster,et al.  Condor-G: A Computation Management Agent for Multi-Institutional Grids , 2004, Cluster Computing.

[5]  Douglas Thain,et al.  The Kangaroo approach to data movement on the Grid , 2001, Proceedings 10th IEEE International Symposium on High Performance Distributed Computing.

[6]  M. J. Peifer,et al.  DECOMMISSIONING THE HIGH PRESSURE TRITIUM LABORATORY AT LOS ALAMOS NATIONAL LABORATORY , 2003 .

[7]  B. Segal,et al.  Grid computing: the European Data Grid Project , 2000, 2000 IEEE Nuclear Science Symposium. Conference Record (Cat. No.00CH37149).

[8]  Miron Livny,et al.  Scheduling Data Placement Activities in Grid , 2003 .

[9]  Mark C. Butler,et al.  Mass Storage at NCSA: SGI DMF and HP UniTree , 1998 .

[10]  Miron Livny,et al.  DiskRouter: A Flexible Infrastructure for High Performance Large Scale Data Transfers , 2003 .