论文信息 - Timely Result-Data Offloading for Improved HPC Center Scratch Provisioning and Serviceability

Timely Result-Data Offloading for Improved HPC Center Scratch Provisioning and Serviceability

Modern High-Performance Computing (HPC) centers are facing a data deluge from emerging scientific applications. Supporting large data entails a significant commitment of the high-throughput center storage system, scratch space. However, the scratch space is typically managed using simple “purge policies,” without sophisticated end-user data services to balance resource consumption and user serviceability. End-user data services such as offloading are performed using point-to-point transfers that are unable to reconcile center's purge and users' delivery deadlines, unable to adapt to changing dynamics in the end-to-end data path and are not fault-tolerant. Such inefficiencies can be prohibitive to sustaining high performance. In this paper, we address the above issues by designing a framework for the timely, decentralized offload of application result data. Our framework uses an overlay of user-specified intermediate and landmark sites to orchestrate a decentralized fault-tolerant delivery. We have implemented our techniques within a production job scheduler (PBS) and data transfer tool (BitTorrent). Our evaluation using both a real implementation and supercomputer job log-driven simulations show that: the offloading times can be significantly reduced (90.4 percent for a 5 GB data transfer); the exposure window can be minimized while also meeting center-user service level agreements.

Ali Raza Butt | Sudharshan S. Vazhkudai | Henry M. Monti | A. Butt | H. M. Monti

[1] Peter Druschel,et al. Pastry: Scalable, distributed object location and routing for large-scale peer-to- , 2001 .

[2] Frank B. Schmuck,et al. GPFS: A Shared-Disk File System for Large Computing Clusters , 2002, FAST.

[3] H.M. Monti,et al. Just-in-time staging of large input data for supercomputing jobs , 2008, 2008 3rd Petascale Data Storage Workshop.

[4] J. L. V. Lewandowski,et al. Global gyrokinetic particle simulation of turbulence and transport in realistic tokamak geometry , 2005 .

[5] Douglas Thain,et al. The Kangaroo approach to data movement on the Grid , 2001, Proceedings 10th IEEE International Symposium on High Performance Distributed Computing.

[6] Ali Raza Butt,et al. /scratch as a cache: rethinking HPC center scratch storage , 2009, ICS.

[7] Y. Charlie Hu,et al. Kosha: A Peer-to-Peer Enhancement for the Network File System , 2004, Proceedings of the ACM/IEEE SC2004 Conference.

[8] Ludmila Cherkasova,et al. FastReplica: Efficient Large File Distribution Within Content Delivery Networks , 2003, USENIX Symposium on Internet Technologies and Systems.

[9] James Arthur Kohl,et al. The Neutron Science TeraGrid Gateway: a TeraGrid science gateway to support the Spallation Neutron Source: Research Articles , 2007 .

[10] James Arthur Kohl,et al. The Neutron Science TeraGrid Gateway: a TeraGrid science gateway to support the Spallation Neutron Source , 2007, Concurr. Comput. Pract. Exp..

[11] Suresh Marru,et al. The LEAD Portal: a TeraGrid gateway and application service architecture , 2007, Concurr. Comput. Pract. Exp..

[12] Micah Beck,et al. The Internet Backplane Protocol: Storage in the Network , 1999 .

[13] KyoungSoo Park,et al. Scale and Performance in the CoBlitz Large-File Distribution Service , 2006, NSDI.

[14] S. Shah,et al. Reliability analysis of disk drive failure mechanisms , 2005, Annual Reliability and Maintainability Symposium, 2005. Proceedings..

[15] Karsten Schwan,et al. DataStager: scalable data staging services for petascale applications , 2009, HPDC '09.

[16] Alma Riska,et al. Idle Read After Write - IRAW , 2008, USENIX Annual Technical Conference.

[17] Bianca Schroeder,et al. Disk Failures in the Real World: What Does an MTTF of 1, 000, 000 Hours Mean to You? , 2007, FAST.

[18] Chao Wang,et al. Optimizing center performance through coordinated data staging, scheduling and recovery , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[19] Miron Livny,et al. Stork: making data placement a first class citizen in the grid , 2004, 24th International Conference on Distributed Computing Systems, 2004. Proceedings..

[20] Suresh Marru,et al. The LEAD Portal: a TeraGrid gateway and application service architecture: Research Articles , 2007 .

[21] Ali Raza Butt,et al. Timely offloading of result-data in HPC centers , 2008, ICS '08.

[22] P. Maymounkov. Online codes , 2002 .

[23] Siddhartha Annapureddy,et al. Shark: scaling file servers via cooperative caching , 2005, NSDI.

[24] Lustre : A Scalable , High-Performance File System Cluster , 2003 .

[25] Eduardo Pinheiro,et al. Failure Trends in a Large Disk Drive Population , 2007, FAST.

[26] Joel H. Saltz,et al. Using overlays for efficient data transfer over shared wide-area networks , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[27] Richard Wolski,et al. The network weather service: a distributed resource performance forecasting service for metacomputing , 1999, Future Gener. Comput. Syst..

[28] Pablo Rodriguez,et al. Parallel-access for mirror sites in the Internet , 2000, Proceedings IEEE INFOCOM 2000. Conference on Computer Communications. Nineteenth Annual Joint Conference of the IEEE Computer and Communications Societies (Cat. No.00CH37064).

[29] Miron Livny,et al. Condor-a hunter of idle workstations , 1988, [1988] Proceedings. The 8th International Conference on Distributed.

[30] Rob Sherwood,et al. Slurpie: a cooperative bulk data transfer protocol , 2004, IEEE INFOCOM 2004.

[31] Jennifer M. Schopf,et al. Predicting sporadic grid data transfers , 2002, Proceedings 11th IEEE International Symposium on High Performance Distributed Computing.

[32] Ying Ding,et al. Algorithms for High Performance, Wide-Area Distributed File Downloads , 2003, Parallel Process. Lett..

[33] David E. Culler,et al. A blueprint for introducing disruptive technology into the Internet , 2003, CCRV.

[34] Amin Vahdat,et al. Using Random Subsets to Build Scalable Network Services , 2003, USENIX Symposium on Internet Technologies and Systems.

[35] Ian T. Foster,et al. GASS: a data movement and access service for wide area computing systems , 1999, IOPADS '99.

[36] James S. Plank,et al. A tutorial on Reed–Solomon coding for fault‐tolerance in RAID‐like systems , 1997, Softw. Pract. Exp..

[37] James S. Plank,et al. Downloading replicated, wide-area files - a framework and empirical evaluation , 2004, Third IEEE International Symposium on Network Computing and Applications, 2004. (NCA 2004). Proceedings..

[38] Larry L. Peterson,et al. Reliability and Security in the CoDeeN Content Distribution Network , 2004, USENIX Annual Technical Conference, General Track.

[39] Scott Klasky,et al. High performance threaded data streaming for large scale simulations , 2004, Fifth IEEE/ACM International Workshop on Grid Computing.

[40] Amin Vahdat,et al. Bullet: high bandwidth data dissemination using an overlay mesh , 2003, SOSP '03.

[41] Shankar Pasupathy,et al. An analysis of latent sector errors in disk drives , 2007, SIGMETRICS '07.

[42] Karsten Schwan,et al. DataStager: scalable data staging services for petascale applications , 2009, HPDC.

[43] Evangelos Eleftheriou,et al. Disk scrubbing versus intra-disk redundancy for high-reliability raid storage systems , 2008, SIGMETRICS '08.

[44] Cameron Kiddle,et al. A GridFTP Overlay Network Service , 2006, 2006 7th IEEE/ACM International Conference on Grid Computing.

[45] David R. Karger,et al. Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM '01.