HPC Computation on Hadoop Storage with PLFS

In this report we describe how we adapted the Parallel Log Structured Filesystem (PLFS) to enable HPC applications to be able read and write data from the HDFS cloud storage subsystem. Our enhanced version of PLFS provides HPC applications with the ability to concurrently write from multiple compute nodes into a single file stored in HDFS, thus allowing HPC applications to checkpoint. Our results show that HDFS combined with our PLFS HDFS I/O Store module is able to handle a concurrent write checkpoint workload generated by a benchmark with good performance. Acknowledgements: The work in this paper is based on research supported in part by the Los Alamos National Laboratory, under subcontract number 54515 and 153593 (IRHPIT), by the Department of Energy, under award number DE-FC02-06ER25767 (PDSI), by the National Science Foundation, under award number CNS-1042543 (PRObE), and by the Qatar National Research Fund, under award number NPRP 09-1116-1-172 (Qloud). We also thank the members and companies of the PDL Consortium (including Actifio, APC, EMC, Emulex, Facebook, Fusion-IO, Google, Hewlett-Packard, Hitachi, Huawei, IBM, Intel, LSI, Microsoft, NEC, NetApp, Oracle, Panasas, Riverbed, Samsung, Seagate, STEC, Symantec, VMware, Western Digital) for their interest, insights, feedback, and support.

[1]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[2]  John Bent,et al.  PLFS: a checkpoint filesystem for parallel applications , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[3]  E. Lusk,et al.  An abstract-device interface for implementing portable parallel-I/O interfaces , 1996, Proceedings of 6th Symposium on the Frontiers of Massively Parallel Computation (Frontiers '96).

[4]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[5]  Frank B. Schmuck,et al.  GPFS: A Shared-Disk File System for Large Computing Clusters , 2002, FAST.

[6]  Bin Zhou,et al.  Scalable Performance of the Panasas Parallel File System , 2008, FAST.

[7]  GhemawatSanjay,et al.  The Google file system , 2003 .

[8]  Prashant Pandey,et al.  Cloud Analytics: Do We Really Need to Reinvent the Storage Stack? , 2009, HotCloud.