High performance support of parallel virtual file system (PVFS2) over Quadrics

Parallel I/O needs to keep pace with the demand of high performance computing applications on systems with ever-increasing speed. Exploiting high-end interconnect technologies to reduce the network access cost and scale the aggregated bandwidth is one of the ways to increase the performance of storage systems. In this paper, we explore the challenges of supporting parallel file system with modern features of Quadrics, including user-level communication and RDMA operations. We design and implement a Quadrics-capable version of a parallel file system (PVFS2). Our design overcomes the challenges imposed by Quadrics static communication model to dynamic client/server architectures. Quadrics QDMA and RDMA mechanisms are integrated and optimized for high performance data communication. Zero-copy PVFS2 list IO is achieved with a Single Event Associated MUltiple RDMA (SEAMUR) mechanism. Experimental results indicate that the performance of PVFS2, with Quadrics user-level protocols and RDMA operations, is significantly improved in terms of both data transfer and management operations. With four IO server nodes, our implementation improves PVFS2 aggregated read bandwidth by up to 140% compared to PVFS2 over TCP on top of Quadrics IP implementation. Moreover, it delivers significant performance improvement to application benchmarks such as mpi-tile-io [24] and BTIO [26]. To the best of our knowledge, this is the first work in the literature to report the design of a high performance parallel file system over Quadrics user-level communication protocols.

[1]  Dhabaleswar K. Panda,et al.  PVFS over InfiniBand: design and performance evaluation , 2003, 2003 International Conference on Parallel Processing, 2003. Proceedings..

[2]  Fabrizio Petrini,et al.  QsNet II : An Interconnect for Supercomputing Applications * , 2004 .

[3]  Rajeev Thakur,et al.  On implementing MPI-IO portably and with high performance , 1999, IOPADS '99.

[4]  Yuanyuan Zhou,et al.  Experiences with VI communication for database storage , 2002, ISCA.

[5]  Wu-chun Feng,et al.  The Quadrics Network: High-Performance Clustering Technology , 2002, IEEE Micro.

[6]  Robert B. Ross,et al.  PVFS: A Parallel File System for Linux Clusters , 2000, Annual Linux Showcase & Conference.

[7]  Robert Latham,et al.  The Impact of File Systems on MPI-IO Scalability , 2004, PVM/MPI.

[8]  Sudhakar Yalamanchili,et al.  Interconnection Networks: An Engineering Approach , 2002 .

[9]  Peter F. Corbett,et al.  The Direct Access File System , 2003, FAST.

[10]  Robert B. Ross,et al.  Noncontiguous I/O through PVFS , 2002, Proceedings. IEEE International Conference on Cluster Computing.

[11]  Charles L. Seitz,et al.  Myrinet: A Gigabit-per-Second Local Area Network , 1995, IEEE Micro.

[12]  H KatzRandy,et al.  A case for redundant arrays of inexpensive disks (RAID) , 1988 .

[13]  Dhabaleswar K. Panda,et al.  Supporting efficient noncontiguous access in PVFS over Infiniband , 2003, 2003 Proceedings IEEE International Conference on Cluster Computing.

[14]  David Kotz,et al.  The galley parallel file system , 1997, ICS '96.

[15]  Dhabaleswar K. Panda,et al.  Design and implementation of open MPI over Quadrics/Elan4 , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[16]  Rob VanderWijngaart,et al.  NAS Parallel Benchmarks I/O Version 2.4. 2.4 , 2002 .

[17]  Randy H. Katz,et al.  A case for redundant arrays of inexpensive disks (RAID) , 1988, SIGMOD '88.

[18]  Andrew A. Chien,et al.  PPFS: a high performance portable parallel file system , 1995, ICS '95.

[19]  Tao Yang,et al.  The Panasas ActiveScale Storage Cluster - Delivering Scalable High Bandwidth Storage , 2004, Proceedings of the ACM/IEEE SC2004 Conference.

[20]  Robert B. Ross,et al.  BMI: a network abstraction layer for parallel I/O , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.