Hybrid flash arrays for HPC storage systems: An alternative to burst buffers

Cloud and high-performance computing storage systems are comprised of thousands of physical storage devices and uses software that organize them into multiple data tiers based on access frequency. The characteristics of these devices lend themselves well to these tiers as devices have differing ratios of performance to capacity. Due to this, these systems have, for the past several years, incorporated a mix of flash devices and mechanical spinning hard disk drives. Although a single media type will be ideal, the economic reality is that a hybrid system must use flash for performance and disk for capacity. Within the high-performance computing community, flash has been used to create a new tier called burst buffers which are typically software managed, user visible, wed to a particular file system, and buffer all IO traffic before subsequent migration to disk. In this paper, we propose an alternative architecture that is hardware managed, user transparent, file system agnostic, and that only buffers small IO while allowing large sequential IO to access the disks directly. Our evaluation of this alternative architecture finds that it achieves comparable results to the reported burst buffer numbers and improves on systems comprised solely of disks by several orders of magnitude for a fraction of the cost.

[1]  Lei Cao,et al.  To share or not to share: comparing burst buffer architectures , 2017, SpringSim.

[2]  Bronis R. de Supinski,et al.  Detailed Modeling and Evaluation of a Scalable Multilevel Checkpointing System , 2014, IEEE Transactions on Parallel and Distributed Systems.

[3]  Bronis R. de Supinski,et al.  Design, Modeling, and Evaluation of a Scalable Multi-level Checkpointing System , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[4]  Robert B. Ross,et al.  On the role of burst buffers in leadership-class storage systems , 2012, 012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST).

[5]  Margo I. Seltzer,et al.  File system aging—increasing the relevance of file system benchmarks , 1997, SIGMETRICS '97.

[6]  Teng Wang,et al.  BurstFS: A Distributed Burst Buffer File System for Scientific Applications , 2016 .

[7]  Yang Liu,et al.  Automatic identification of application I/O signatures from noisy server-side traces , 2014, FAST.

[8]  Sorin Faibish,et al.  On the Non-Suitability of Non-Volatility , 2015, HotStorage.

[9]  John E. Shore On the external storage fragmentation produced by first-fit and best-fit allocation strategies , 1975, CACM.

[10]  Sorin Faibish,et al.  Jitter-free co-processing on a prototype exascale storage stack , 2012, 012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST).

[11]  Liu Yang,et al.  Server-Side Log Data Analytics for I/O Workload Characterization and Coordination on Large Shared Storage Systems , 2016 .

[12]  Lalit Kumar,et al.  Checkpointing in Distributed Computing Systems , 2002 .

[13]  John Bent,et al.  PLFS: a checkpoint filesystem for parallel applications , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[14]  Karsten Schwan,et al.  Extending I/O through high performance data services , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.