Supporting efficient noncontiguous access in PVFS over Infiniband

Noncontiguous I/O access is the main access pattern in many scientific applications. Noncontiguity exists both in access to files and in access to target memory regions on the client. This characteristic imposes a requirement of native noncontiguous I/O access support in cluster file systems for high performance. In this paper, we address noncontiguous data transmission between the client and the I/O server in cluster file systems over a high performance network. We propose a novel approach, RDMA Gather/Scatter, to transfer noncontiguous data for such I/O accesses. We also propose a new scheme, optimistic group registration, to reduce memory registration costs associated with this approach. We have designed and incorporated this approach in a version of PVFS over InfiniBand. Through a range of PVFS and MPI-IO micro-benchmarks, and the NAS BTIO benchmark, we demonstrate that our approach attains significant performance gains compared to other existing approaches.

[1]  Rajeev Thakur,et al.  Passion: Optimized I/O for Parallel Applications , 1996, Computer.

[2]  Rajeev Thakur,et al.  On implementing MPI-IO portably and with high performance , 1999, IOPADS '99.

[3]  Robert B. Ross,et al.  Efficient structured data access in parallel file systems , 2003, 2003 Proceedings IEEE International Conference on Cluster Computing.

[4]  Joachim Worringen,et al.  Exploiting transparent remote memory access for non-contiguous- and one-sided-communication , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.

[5]  Mark Wittle,et al.  Direct Access File System (DAFS) , 2001 .

[6]  Dhabaleswar K. Panda,et al.  High performance RDMA-based MPI implementation over InfiniBand , 2003, ICS.

[7]  Rodney Van Meter,et al.  Network attached storage architecture , 2000, CACM.

[8]  Evgenia Smirni,et al.  Workload Characterization of Input/Output Intensive Parallel Applications , 1997, Computer Performance Evaluation.

[9]  Dhabaleswar K. Panda,et al.  MPI-IO on DAFs over VIA: implementation and performance evaluation , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.

[10]  Robert B. Ross,et al.  Noncontiguous I/O through PVFS , 2002, Proceedings. IEEE International Conference on Cluster Computing.

[11]  Hiroshi Tezuka,et al.  Pin-down cache: a virtual memory management technique for zero-copy communication , 1998, Proceedings of the First Merged International Parallel Processing Symposium and Symposium on Parallel and Distributed Processing.

[12]  Bin Jia,et al.  MPI-IO/GPFS, an Optimized Implementation of MPI-IO on Top of GPFS , 2001, ACM/IEEE SC 2001 Conference (SC'01).

[13]  Robert Hill,et al.  Functionality and Performance Evaluation of File Systems for Storage Area Networks (SAN) , 2000, IEEE Symposium on Mass Storage Systems.

[14]  Thorsten von Eicken,et al.  Incorporating Memory Management into User-Level Network Interfaces , 1997 .

[15]  Marianne Winslett,et al.  Server-Directed Collective I/O in Panda , 1995, Proceedings of the IEEE/ACM SC95 Conference.

[16]  Frank B. Schmuck,et al.  GPFS: A Shared-Disk File System for Large Computing Clusters , 2002, FAST.

[17]  Carla Schlatter Ellis,et al.  File-Access Characteristics of Parallel Scientific Workloads , 1996, IEEE Trans. Parallel Distributed Syst..

[18]  Dhabaleswar K. Panda,et al.  PVFS over InfiniBand: design and performance evaluation , 2003, 2003 International Conference on Parallel Processing, 2003. Proceedings..

[19]  Kai Li,et al.  Experiences with VI communication for database storage , 2002, Proceedings 29th Annual International Symposium on Computer Architecture.

[20]  Sandra Johnson Baylor,et al.  Parallel I/O Workload Characteristics Using Vesta , 1996, Input/Output in Parallel and Distributed Computer Systems.

[21]  Rob VanderWijngaart,et al.  NAS Parallel Benchmarks I/O Version 2.4. 2.4 , 2002 .

[22]  Robert B. Ross,et al.  PVFS: A Parallel File System for Linux Clusters , 2000, Annual Linux Showcase & Conference.

[23]  Robert B. Ross,et al.  Noncontiguous I/O accesses through MPI-IO , 2003, CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings..

[24]  Alok N. Choudhary,et al.  Improved parallel I/O via a two-phase run-time access strategy , 1993, CARN.

[25]  Robert B. Ross,et al.  Implementing Fast and Reusable Datatype Processing , 2003, PVM/MPI.

[26]  Anthony Skjellum,et al.  A High-Performance, Portable Implementation of the MPI Message Passing Interface Standard , 1996, Parallel Comput..

[27]  David Kotz,et al.  Disk-directed I/O for MIMD multiprocessors , 1994, OSDI '94.

[28]  Rajeev Thakur,et al.  Optimizing noncontiguous accesses in MPI-IO , 2002, Parallel Comput..