Accelerating Science with the NERSC Burst Buffer Early User Program

Accelerating Science with the NERSC Burst Buffer Early User Program Wahid Bhimji ∗ , Debbie Bard ∗ , Melissa Romanus ∗† , David Paul ∗ , Andrey Ovsyannikov ∗ , Brian Friesen ∗ , Matt Bryson ∗ , Joaquin Correa ∗ , Glenn K. Lockwood ∗ , Vakho Tsulaia ∗ , Suren Byna ∗ , Steve Farrell ∗ , Doga Gursoy ‡ , Chris Daley ∗ , Vince Beckner ∗ , Brian Van Straalen ∗ , David Trebotich ∗ , Craig Tull ∗ , Gunther Weber ∗ , Nicholas J. Wright ∗ , Katie Antypas ∗ , Prabhat ∗ Lawrence Berkeley National Laboratory, Berkeley, CA 94720 USA, Email: wbhimji@lbl.gov † Rutgers Discovery Informatics Institute, Rutgers University, Piscataway, NJ, USA ‡ Advanced Photon Source, Argonne National Laboratory, 9700 South Cass Avenue, Lemont, IL 60439, USA their workflow, initial results and performance measurements. We conclude with several important lessons learned from this first application of Burst Buffers at scale for HPC. Abstract—NVRAM-based Burst Buffers are an important part of the emerging HPC storage landscape. The National Energy Research Scientific Computing Center (NERSC) at Lawrence Berkeley National Laboratory recently installed one of the first Burst Buffer systems as part of its new Cori supercomputer, collaborating with Cray on the development of the DataWarp software. NERSC has a diverse user base comprised of over 6500 users in 700 different projects spanning a wide variety of scientific computing applications. The use-cases of the Burst Buffer at NERSC are therefore also considerable and diverse. We describe here performance measurements and lessons learned from the Burst Buffer Early User Program at NERSC, which selected a number of research projects to gain early access to the Burst Buffer and exercise its capability to enable new scientific advancements. To the best of our knowledge this is the first time a Burst Buffer has been stressed at scale by diverse, real user workloads and therefore these lessons will be of considerable benefit to shaping the developing use of Burst Buffers at HPC centers. Index Terms—Nonvolatile memory, Data storage systems, Burst Buffer, Parallel I/O, High Performance Computing A. The I/O Hierarchy Recent hardware advancements in HPC systems have en- abled scientific simulations and experimental workflows to tackle larger problems than ever before. The increase in scale and complexity of the applications and scientific instruments has led to corresponding increase in data exchange, interaction, and communication. The efficient management of I/O has become one of the biggest challenges in accelerating the time- to-discovery for science. Historically, the memory architecture of HPC machines has involved compute nodes with on-node memory (DRAM), a limited number of I/O subsystem nodes for handling I/O requests, and a disk-based storage appliance exposed as a parallel file system. DRAM node-memory is an expensive commodity with limited capacity, but fast read/write access, while disk-based storage systems provide a relatively inexpen- sive way to store and persist large amounts of data, but with considerably lower bandwidth and higher latency. This traditional HPC architecture is often unable to meet the I/O coordination and communication needs of the applications that run on it, particularly at extreme scale. In order to address this I/O bottleneck system architects have explored ways of offering cost-effective memory and filesystem solutions that can offer faster performance than parallel filesystems on disk- based storage. A natural extension of this work has been to explore ways of deepening the memory hierarchies on HPC machines to include multiple storage layers in-between DRAM and disk. These proposed solutions leverage technology ad- vancements like solid-state devices (SSDs), as well as other flash-based and/or NVRAM offerings. Therefore, some state-of-the-art HPC systems now include a new tier of ‘intermediate’ storage between the compute nodes and the hard disk storage, known as a ‘Burst Buffer’. This layer is slower (but higher capacity) than on-node memory, but faster (and lower capacity) than HDD-based storage. I. I NTRODUCTION HPC faces a growing I/O challenge. One path forward is a fast storage layer, close to the compute, termed a Burst Buffer [1]. Such a layer was deployed with the first phase of the Cori Cray XC40 System at NERSC in the later half of 2015, providing around 900 TB of NVRAM-based storage. This system not only employs state-of-the-art SSD hardware, but also a new approach to on-demand filesystems through Cray’s DataWarp software. In order to enable scientific ap- plications to utilize this new layer in the storage hierarchy, NERSC is running the Burst Buffer Early User Program, focused on real science applications and workflows that can benefit from the accelerated I/O the system provides. The program is providing a means to test and debug the new technology as well as drive new science results. In this paper we first briefly review the motivation for Burst Buffers and the range of potential use-cases for NERSC’s diverse scientific workload. We then provide a brief overview of the architecture deployed at NERSC in Section II-B before outlining the Early User Program and the projects selected. We then focus on five specific projects and describe in detail

[1]  K. Bowers,et al.  Ultrahigh performance three-dimensional electromagnetic relativistic kinetic plasma simulationa) , 2008 .

[2]  Surendra Byna,et al.  Taming parallel I/O complexity with auto-tuning , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[3]  William Daughton,et al.  Role of electron physics in the development of turbulent magnetic reconnection in collisionless plasmas , 2011 .

[4]  M. White,et al.  The Lyman α forest in optically thin hydrodynamical simulations , 2014, 1406.6361.

[5]  Tony Lanzirotti,et al.  Scientific data exchange: a schema for HDF5-based storage of raw and analyzed data. , 2014, Journal of synchrotron radiation.

[6]  Rajeev Thakur,et al.  Data sieving and collective I/O in ROMIO , 1998, Proceedings. Frontiers '99. Seventh Symposium on the Frontiers of Massively Parallel Computation.

[7]  Mark F. Adams,et al.  High-Resolution Simulation of Pore-Scale Reactive Transport Processes Associated with Carbon Sequestration , 2014, Computing in Science & Engineering.

[8]  Robert B. Ross,et al.  On the role of burst buffers in leadership-class storage systems , 2012, 012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST).

[9]  T Maier,et al.  ATLAS I/O performance optimization in as-deployed environments , 2015 .

[10]  D. Trebotich,et al.  An adaptive finite volume method for the incompressible Navier–Stokes equations in complex geometries , 2015 .

[11]  Robert Latham,et al.  24/7 Characterization of petascale I/O workloads , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.

[12]  Francesco De Carlo,et al.  TomoPy: a framework for the analysis of synchrotron tomographic data , 2014, Optics & Photonics - Optical Engineering + Applications.

[13]  Jean M. Sexton,et al.  Nyx: A MASSIVELY PARALLEL AMR CODE FOR COMPUTATIONAL COSMOLOGY , 2013, J. Open Source Softw..

[14]  Attila Krasznahorkay,et al.  Implementation of the ATLAS Run 2 event data model , 2015 .

[15]  John Shalf,et al.  Tuning HDF5 for Lustre File Systems , 2010 .

[16]  Arie Shoshani,et al.  Parallel I/O, analysis, and visualization of a trillion particle simulation , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[17]  Paolo Calafiura,et al.  Fine grained event processing on HPCs with the ATLAS Yoda system , 2015 .

[18]  Hank Childs,et al.  VisIt: An End-User Tool for Visualizing and Analyzing Very Large Data , 2011 .