Characterizing output bottlenecks in a supercomputer

Supercomputer I/O loads are often dominated by writes. HPC (High Performance Computing) file systems are designed to absorb these bursty outputs at high bandwidth through massive parallelism. However, the delivered write bandwidth often falls well below the peak. This paper characterizes the data absorption behavior of a center-wide shared Lustre parallel file system on the Jaguar supercomputer. We use a statistical methodology to address the challenges of accurately measuring a shared machine under production load and to obtain the distribution of bandwidth across samples of compute nodes, storage targets, and time intervals. We observe and quantify limitations from competing traffic, contention on storage servers and I/O routers, concurrency limitations in the client compute node operating systems, and the impact of variance (stragglers) on coupled output such as striping. We then examine the implications of our results for application performance and the design of I/O middleware systems on shared supercomputers.

[1]  Gregory R. Ganger,et al.  Generating Representative Synthetic Workloads: An Unsolved Problem , 1995 .

[2]  Bianca Schroeder,et al.  A Large-Scale Study of Failures in High-Performance Computing Systems , 2010, IEEE Trans. Dependable Secur. Comput..

[3]  Mark F. Adams,et al.  Gyrokinetic particle simulation of neoclassical transport in the pedestal/scrape-off region of a tokamak plasma , 2006 .

[4]  L. Chacón,et al.  A non-staggered, conservative, V×B=0' finite-volume scheme for 3D implicit extended magnetohydrodynamics in curvilinear geometries , 2004, Comput. Phys. Commun..

[5]  Leonid Oliker,et al.  Parallel I/O performance: From events to ensembles , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[6]  Andrew A. Chien,et al.  Input/Output Characteristics of Scalable Parallel Applications , 1995, SC.

[7]  Galen M. Shipman,et al.  Workload characterization of a leadership class storage cluster , 2010, 2010 5th Petascale Data Storage Workshop (PDSW '10).

[8]  Carla Schlatter Ellis,et al.  File-Access Characteristics of Parallel Scientific Workloads , 1996, IEEE Trans. Parallel Distributed Syst..

[9]  Dhabaleswar K. Panda,et al.  Scalable Earthquake Simulation on Petascale Supercomputers , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[10]  Karsten Schwan,et al.  Managing Variability in the IO Performance of Petascale Storage Systems , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[11]  John Shalf,et al.  Characterizing and predicting the I/O performance of HPC applications using a parameterized synthetic benchmark , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[12]  Robert Latham,et al.  I/O performance challenges at leadership scale , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[13]  Robert Latham,et al.  24/7 Characterization of petascale I/O workloads , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.

[14]  Scott Klasky,et al.  Grid-based Parallel Data Streaming Implemented for the Gyrokinetic Toroidal Code , 2003 .

[15]  Choong-Seock Chang,et al.  Spontaneous rotation sources in a quiescent tokamak edge plasma , 2008 .

[16]  Nancy P. Kronenberg,et al.  VAXcluster: a closely-coupled distributed system , 1986, TOCS.

[17]  Ray W. Grout,et al.  EDO: Improving Read Performance for Scientific Applications through Elastic Data Organization , 2011, 2011 IEEE International Conference on Cluster Computing.

[18]  Zhe Zhang,et al.  Enhancing I/O throughput via efficient routing and placement for large-scale parallel file systems , 2011, 30th IEEE International Performance Computing and Communications Conference.

[19]  Carlos Maltzahn,et al.  Ceph: a scalable, high-performance distributed file system , 2006, OSDI '06.

[20]  Robert Latham,et al.  Understanding and improving computational science storage access through continuous characterization , 2011, MSST.

[21]  Prithviraj Banerjee,et al.  A study of I/O behavior of Perfect benchmarks on a multiprocessor , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[22]  Galen M. Shipman,et al.  Efficient Object Storage Journaling in a Distributed Parallel File System , 2010, FAST.

[23]  Scott Klasky,et al.  Terascale direct numerical simulations of turbulent combustion using S3D , 2008 .

[24]  Karsten Schwan,et al.  Adaptable, metadata rich IO methods for portable high performance IO , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[25]  John Shalf,et al.  Using IOR to analyze the I/O Performance for HPC Platforms , 2007 .

[26]  P. Messina,et al.  Architectural requirements of parallel scientific applications with explicit communication , 1993, ISCA '93.

[27]  Katie Antypas,et al.  File System Monitoring as a Window Into User I/O Requirements , 2010 .