A data transfer framework for large-scale science experiments

Modern scientific experiments can generate hundreds of gigabytes to terabytes or even petabytes of data that may furthermore be maintained in large numbers of relatively small files. Frequently, this data must be disseminated to remote collaborators or computational centers for data analysis. Moving this data with high performance and strong robustness and providing a simple interface for users are challenging tasks. We present a data transfer framework comprising a high-performance data transfer library based on GridFTP, a data scheduler, and a graphical user interface that allows users to transfer their data easily, reliably, and securely. This system incorporates automatic tuning mechanisms to select at runtime the number of concurrent threads to be used for transfers. Also included are restart mechanisms capable of dealing with client, network, and server failures. Experimental results indicate that our data transfer system can significantly improve data transfer performance and can recover well from failures.

[1]  William E. Allcock,et al.  The Globus Striped GridFTP Framework and Server , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[2]  Mehmet Balman,et al.  A new paradigm: Data-aware scheduling in grid computing , 2009, Future Gener. Comput. Syst..

[3]  Tobin Maginnis,et al.  Bulk data transfer forecasts and the implications to grid scheduling , 2003 .

[4]  Miron Livny,et al.  Stork: making data placement a first class citizen in the grid , 2004, 24th International Conference on Distributed Computing Systems, 2004. Proceedings..

[5]  Jon Postel,et al.  File Transfer Protocol , 1985, RFC.

[6]  Y. Wu,et al.  PhEDEx high-throughput data transfer management system , 2006 .

[7]  Peter A. Dinda,et al.  Modeling and taming parallel TCP on the wide area network , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[8]  Mario Lassnig,et al.  Managing ATLAS data on a petabyte-scale with DQ2 , 2008 .

[9]  Mehmet Balman,et al.  Dynamically tuning level of parallelism in wide area data transfers , 2008, DADC '08.

[10]  Steven Tuecke,et al.  GridFTP: Protocol Extensions to FTP for the Grid , 2001 .

[11]  Bo Li,et al.  GridFTP GUI: An Easy and Efficient Way to Transfer Data in Grid , 2009, GridNets.

[12]  C. Davenhall,et al.  Secure, Performance-Oriented Data Management for nanoCMOS Electronics , 2008, 2008 IEEE Fourth International Conference on eScience.

[13]  Mario Lauria,et al.  Improving the Performance of Remote I/O Using Asynchronous Primitives , 2006, 2006 15th IEEE International Conference on High Performance Distributed Computing.

[14]  Brian D. Noble,et al.  The end-to-end performance effects of parallel TCP sockets on a lossy wide-area network , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.