Scalable I/O forwarding framework for high-performance computing systems

Current leadership-class machines suffer from a significant imbalance between their computational power and their I/O bandwidth. While Moore's law ensures that the computational power of high-performance computing systems increases with every generation, the same is not true for their I/O subsystems. The scalability challenges faced by existing parallel file systems with respect to the increasing number of clients, coupled with the minimalistic compute node kernels running on these machines, call for a new I/O paradigm to meet the requirements of data-intensive scientific applications. I/O forwarding is a technique that attempts to bridge the increasing performance and scalability gap between the compute and I/O components of leadership-class machines by shipping I/O calls from compute nodes to dedicated I/O nodes. The I/O nodes perform operations on behalf of the compute nodes and can reduce file system traffic by aggregating, rescheduling, and caching I/O requests. This paper presents an open, scalable I/O forwarding framework for high-performance computing systems. We describe an I/O protocol and API for shipping function calls from compute nodes to I/O nodes, and we present a quantitative analysis of the overhead associated with I/O forwarding.

[1]  Sun Microsystems,et al.  RPC: Remote Procedure Call Protocol specification: Version 2 , 1988, RFC.

[2]  Wei-keng Liao,et al.  Noncontiguous access through MPI-IO , 2003 .

[3]  Lustre : A Scalable , High-Performance File System Cluster , 2003 .

[4]  Suzanne M. Kelly,et al.  Software Architecture of the Light Weight Kernel, Catamount , 2005 .

[5]  Mark R. Fahey,et al.  I/O performance on a massively parallel Cray XT3/XT4 , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[6]  Tao Yang,et al.  The Panasas ActiveScale Storage Cluster - Delivering Scalable High Bandwidth Storage , 2004, Proceedings of the ACM/IEEE SC2004 Conference.

[7]  Robert B. Ross,et al.  BMI: a network abstraction layer for parallel I/O , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[8]  Jianwei Li,et al.  Parallel netCDF: A High-Performance Scientific I/O Interface , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[9]  Frank B. Schmuck,et al.  GPFS: A Shared-Disk File System for Large Computing Clusters , 2002, FAST.

[10]  Sue Kelly,et al.  Red storm IO performance analysis , 2007, 2007 IEEE International Conference on Cluster Computing.

[11]  Raj Srinivasan,et al.  XDR: External Data Representation Standard , 1995, RFC.

[12]  T. Inglett,et al.  Designing a Highly-Scalable Operating System: The Blue Gene/L Story , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[13]  Susan Coghlan,et al.  Operating system issues for petascale systems , 2006, OPSR.

[14]  Susan Coghlan,et al.  Benchmarking the effects of operating system interference on extreme-scale parallel machines , 2008, Cluster Computing.

[15]  Jeffrey S. Vetter,et al.  Performance characterization and optimization of parallel I/O on the Cray XT , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[16]  Kamil Iskra,et al.  ZOID: I/O-forwarding infrastructure for petascale architectures , 2008, PPoPP.

[17]  Rolf Riesen,et al.  Portals 3.0: protocol building blocks for low overhead communication , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.

[18]  Rajeev Thakur,et al.  On implementing MPI-IO portably and with high performance , 1999, IOPADS '99.

[19]  Robert Latham,et al.  High performance file I/O for the Blue Gene/L supercomputer , 2006, The Twelfth International Symposium on High-Performance Computer Architecture, 2006..

[20]  Rob VanderWijngaart,et al.  NAS Parallel Benchmarks I/O Version 2.4. 2.4 , 2002 .

[21]  Rajeev Thakur,et al.  Users guide for ROMIO: A high-performance, portable MPI-IO implementation , 1997 .

[22]  Robert B. Ross,et al.  PVFS: A Parallel File System for Linux Clusters , 2000, Annual Linux Showcase & Conference.

[23]  Robert Latham,et al.  The Impact of File Systems on MPI-IO Scalability , 2004, PVM/MPI.

[24]  Robert B. Ross,et al.  Noncontiguous I/O accesses through MPI-IO , 2003, CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings..

[25]  Ibm Blue,et al.  Overview of the IBM Blue Gene/P Project , 2008, IBM J. Res. Dev..

[26]  Alex Rapaport,et al.  Mpi-2: extensions to the message-passing interface , 1997 .

[27]  Wei-keng Liao,et al.  Dynamically adapting file domain partitioning methods for collective I/O based on underlying parallel file system locking protocols , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.