CATCH: A Cloud-Based Adaptive Data Transfer Service for HPC

Modern High Performance Computing (HPC) applications process very large amounts of data. A critical research challenge lies in transporting input data to the HPC center from a number of distributed sources, e.g., scientific experiments and web repositories, etc., and offloading the result data to geographically distributed, intermittently available end-users, often over under-provisioned connections. Such end-user data services are typically performed using point-to-point transfers that are designed for well-endowed sites and are unable to reconcile the center's resource usage and users' delivery deadlines, unable to adapt to changing dynamics in the end-to-end data path and are not fault-tolerant. To overcome these inefficiencies, decentralized HPC data services are emerging as viable alternatives. In this paper, we develop and enhance such distributed data services by designing CATCH, a Cloud-based Adaptive data Transfer service for HPC. CATCH leverages a bevy of cloud storage resources to orchestrate a decentralized data transport with fail-over capabilities. Our results demonstrate that CATCH is a feasible approach, and can help improve the data transfer times at the HPC center by as much as 81.1\% for typical HPC workloads.

[1]  Fei Meng,et al.  Functional Partitioning to Optimize End-to-End Performance on Many-core Architectures , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[2]  Ali Raza Butt,et al.  Reconciling scratch space consumption, exposure, and volatility to achieve timely staging of job input data , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[3]  Tejaswi Redkar,et al.  Windows Azure Platform , 2010 .

[4]  Ali Raza Butt,et al.  /scratch as a cache: rethinking HPC center scratch storage , 2009, ICS.

[5]  Ümit V. Çatalyürek,et al.  Using overlays for efficient data transfer over shared wide-area networks , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[6]  Ali Raza Butt,et al.  Timely offloading of result-data in HPC centers , 2008, ICS '08.

[7]  Matei Ripeanu,et al.  stdchk: A Checkpoint Storage System for Desktop Grid Computing , 2007, 2008 The 28th International Conference on Distributed Computing Systems.

[8]  Cameron Kiddle,et al.  A GridFTP Overlay Network Service , 2006, 2006 7th IEEE/ACM International Conference on Grid Computing.

[9]  KyoungSoo Park,et al.  Scale and Performance in the CoBlitz Large-File Distribution Service , 2006, NSDI.

[10]  Siddhartha Annapureddy,et al.  Shark: scaling file servers via cooperative caching , 2005, NSDI.

[11]  Larry L. Peterson,et al.  Reliability and Security in the CoDeeN Content Distribution Network , 2004, USENIX Annual Technical Conference, General Track.

[12]  Miron Livny,et al.  Stork: making data placement a first class citizen in the grid , 2004, 24th International Conference on Distributed Computing Systems, 2004. Proceedings..

[13]  Amin Vahdat,et al.  Bullet: high bandwidth data dissemination using an overlay mesh , 2003, SOSP '03.

[14]  Amin Vahdat,et al.  Using Random Subsets to Build Scalable Network Services , 2003, USENIX Symposium on Internet Technologies and Systems.

[15]  Douglas Thain,et al.  The Kangaroo approach to data movement on the Grid , 2001, Proceedings 10th IEEE International Symposium on High Performance Distributed Computing.

[16]  Richard Wolski,et al.  The network weather service: a distributed resource performance forecasting service for metacomputing , 1999, Future Gener. Comput. Syst..

[17]  Ian T. Foster,et al.  GASS: a data movement and access service for wide area computing systems , 1999, IOPADS '99.

[18]  Xun-Li Wang,et al.  Spallation Neutron Source , 2001 .

[19]  Micah Beck,et al.  The Internet Backplane Protocol: Storage in the Network , 1999 .

[20]  J. O. Johnson,et al.  Spallation Neutron Source , 1998 .

[21]  Wichard Woyke Europäische Organisation für Kernforschung (Conseil Européen pour la recherche nucléaire/CERN) , 1995 .