High Throughput WAN Data Transfer with Hadoop-based Storage

Hadoop distributed file system (HDFS) is becoming more popular in recent years as a key building block of integrated grid storage solution in the field of scientific computing. Wide Area Network (WAN) data transfer is one of the important data operations for large high energy physics experiments to manage, share and process datasets of PetaBytes scale in a highly distributed grid computing environment. In this paper, we present the experience of high throughput WAN data transfer with HDFS-based Storage Element. Two protocols, GridFTP and fast data transfer (FDT), are used to characterize the network performance of WAN data transfer.

[1]  Igor Sfiligoi,et al.  Using Condor Glideins for Distributed Testing of Network Facing Services , 2010, 2010 Third International Joint Conference on Computational Science and Optimization.

[2]  Garhan Attebury,et al.  Hadoop distributed file system for the Grid , 2009, 2009 IEEE Nuclear Science Symposium Conference Record (NSS/MIC).

[3]  Igor Sfiligoi,et al.  The Pilot Way to Grid Resources Using glideinWMS , 2009, 2009 WRI World Congress on Computer Science and Information Engineering.

[4]  Paul Avery,et al.  A Science Driven Production Cyberinfrastructure—the Open Science Grid , 2011, Journal of Grid Computing.