Provisioning a Multi-tiered Data Staging Area for Extreme-Scale Machines

Massively parallel scientific applications, running on extreme-scale supercomputers, produce hundreds of terabytes of data per run, driving the need for storage solutions to improve their I/O performance. Traditional parallel file systems (PFS) in high performance computing (HPC) systems are unable to keep up with such high data rates, creating a storage wall. In this work, we present a novel multi-tiered storage architecture comprising hybrid node-local resources to construct a dynamic data staging area for extreme-scale machines. Such a staging ground serves as an impedance matching device between applications and the PFS. Our solution combines diverse resources (e.g., DRAM, SSD) in such a way as to approach the performance of the fastest component technology and the cost of the least expensive one. We have developed an automated provisioning algorithm that aids in meeting the check pointing performance requirement of HPC applications, by using a least-cost storage configuration. We evaluate our approach using both an implementation on a large scale cluster and a simulation driven by six-years worth of Jaguar supercomputer job-logs, and show that our approach, by choosing an appropriate storage configuration, achieves 41.5% cost savings with only negligible impact on performance.

[1]  Mahesh Balakrishnan,et al.  Extending SSD Lifetimes with Disk-Based Write Caches , 2010, FAST.

[2]  Jeanna Matthews,et al.  Intel® Turbo Memory: Nonvolatile disk caches in the storage hierarchy of mainstream computer systems , 2008, TOS.

[3]  Pavan Konanki,et al.  An Exploration of Hybrid Hard Disk Designs Using an Extensible Simulator , 2008 .

[4]  Kai Shen,et al.  A performance evaluation of scientific I/O workloads on Flash-based SSDs , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.

[5]  Michael L. Norman,et al.  Accelerating data-intensive science with Gordon and Dash , 2010 .

[6]  Jeffrey Bennett,et al.  DASH-IO: an empirical study of flash-based IO for HPC , 2010 .

[7]  Angelos Bilas,et al.  Using transparent compression to improve SSD-based I/O caches , 2010, EuroSys '10.

[8]  Fei Meng,et al.  Functional Partitioning to Optimize End-to-End Performance on Many-core Architectures , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[9]  Antony I. T. Rowstron,et al.  Migrating server storage to SSDs: analysis of tradeoffs , 2009, EuroSys '09.

[10]  J. Manickam,et al.  Gyro-kinetic simulation of global turbulent transport properties in tokamak experiments , 2006 .

[11]  Kai Li,et al.  Diskless Checkpointing , 1998, IEEE Trans. Parallel Distributed Syst..

[12]  Arif Merchant,et al.  Minerva: An automated resource provisioning tool for large-scale storage systems , 2001, TOCS.

[13]  Michael M. Swift,et al.  FlashVM: Virtual Memory Management on Flash , 2010, USENIX Annual Technical Conference.

[14]  Christos Faloutsos,et al.  Using Utility to Provision Storage Systems , 2008, FAST.

[15]  Trevor N. Mudge,et al.  Improving NAND Flash Based Disk Caches , 2008, 2008 International Symposium on Computer Architecture.

[16]  C. Kirsch Combo Drive : Optimizing Cost and Performance in a Heterogeneous Storage Device , 2009 .

[17]  Sudharshan S. Vazhkudai,et al.  Aggregate Memory as an Intermediate Checkpoint Storage Device , 2008 .

[18]  M. Mesnier,et al.  Making the most of your SSD : a case for Differentiated Storage Services , 2009 .

[19]  Karsten Schwan,et al.  PreDatA – preparatory data analytics on peta-scale machines , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[20]  Trevor N. Mudge,et al.  Integrating NAND flash devices onto servers , 2009, CACM.

[21]  Ram Swaminathan,et al.  Ergastulum: Quickly fi nding near-optimal storage system designs , 2001 .

[22]  Eric Anderson,et al.  Proceedings of the Fast 2002 Conference on File and Storage Technologies Hippodrome: Running Circles around Storage Administration , 2022 .