Data Elevator: Low-Contention Data Movement in Hierarchical Storage System

Hierarchical storage subsystems that include multiple layers of burst buffers (BB) and disk-based parallel file systems (PFS), are becoming an essential part of HPC systems to address the I/O performance gap. However, the state-of-the-art software for managing these hierarchical storage subsystems, such as Cray DataWarp, requires user involvement in moving data among storage layers. Such manual data movement may experience poor performance because of resource contention on the I/O servers of a layer for serving data movement in the hierarchy as well as regular read/write requests. In this paper, we propose a new system, named Data Elevator, for transparently and efficiently moving data in hierarchical storage. Users specify the final destination for their data, typically a PFS. Data Elevator intercepts the I/O calls, stages data on a fast persistent storage layer (for example, an SSD-based burst buffer), and then asynchronously transfers the data to the final destination in the background. Data Elevator reduces the resource contention on BB servers via offloading the data movement from a fixed number of BB server nodes to compute nodes. The number of the compute nodes is configurable based on the data movement load. Data Elevator also allows optimizations, such as overlapping read and write operations, choosing I/O modes, and aligning buffer boundaries. In our tests with large-scale scientific applications, Data Elevator is as much as 4.2X faster than Cray DataWarp, and 4X faster than directly writing data to PFS.

[1]  Arie Shoshani,et al.  Parallel I/O, analysis, and visualization of a trillion particle simulation , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[2]  Arie Shoshani,et al.  Scientific Data Management - Challenges, Technology, and Deployment , 2009, Scientific Data Management.

[3]  Peter Desnoyers,et al.  Active flash: towards energy-efficient, in-situ data analytics on extreme-scale machines , 2013, FAST.

[4]  Fan Zhang,et al.  Exploring Data Staging Across Deep Memory Hierarchies for Coupled Data Intensive Simulation Workflows , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium.

[5]  Surendra Byna,et al.  Model-Driven Data Layout Selection for Improving Read Performance , 2014, 2014 IEEE International Parallel & Distributed Processing Symposium Workshops.

[6]  Robert B. Ross,et al.  On the role of burst buffers in leadership-class storage systems , 2012, 012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST).

[7]  Robert B. Ross,et al.  Challenges and Considerations for Utilizing Burst Buffers in High-Performance Computing , 2015, ArXiv.

[8]  Teng Wang,et al.  Development of a Burst Buffer System for Data-Intensive Applications , 2015, ArXiv.

[9]  Allan Snavely,et al.  Evaluation of I/O technologies on a flash-based I/O sub-system for HPC , 2011, ASBD '11.

[10]  Kesheng Wu,et al.  FastQuery: A Parallel Indexing System for Scientific Data , 2011, 2011 IEEE International Conference on Cluster Computing.

[11]  Dhabaleswar K. Panda,et al.  Enhancing Checkpoint Performance with Staging IO and SSD , 2010, 2010 International Workshop on Storage Network Architecture and Parallel I/Os.

[12]  Limin Xiao,et al.  A New File-Specific Stripe Size Selection Method for Highly Concurrent Data Access , 2012, 2012 ACM/IEEE 13th International Conference on Grid Computing.

[13]  Dean Hildebrand,et al.  pNFS, POSIX, and MPI-IO: a tale of three semantics , 2009, PDSW '09.

[14]  John Bent,et al.  Storage challenges at Los Alamos National Lab , 2012, 012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST).

[15]  Surendra Byna,et al.  SDS: a framework for scientific data services , 2013, PDSW@SC.

[16]  Trevor N. Mudge,et al.  FlashCache: a NAND flash memory file cache for low power web servers , 2006, CASES '06.

[17]  Scott Klasky,et al.  DataSpaces: an interaction and coordination framework for coupled simulation workflows , 2012, HPDC '10.

[18]  Teng Wang,et al.  TRIO: Burst Buffer Based I/O Orchestration , 2015, 2015 IEEE International Conference on Cluster Computing.

[19]  Mark F. Adams,et al.  Chombo Software Package for AMR Applications Design Document , 2014 .

[20]  Yong Chen,et al.  Fast data analysis with integrated statistical metadata in scientific datasets , 2013, CLUSTER.

[21]  Houjun Tang,et al.  Improving Read Performance with Online Access Pattern Analysis and Prefetching , 2014, Euro-Par.

[22]  Surendra Byna,et al.  Accelerating Science with the NERSC Burst Buffer Early User Program , 2016 .

[23]  Arun Jagatheesan,et al.  Understanding the Impact of Emerging Non-Volatile Memories on High-Performance, IO-Intensive Computing , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.