RAMA: Easy Access to a High-Bandwidth Massively Parallel File System

Massively parallel file systems must provide high bandwidth file access to programs running on their machines. Most accomplish this goal by striping files across arrays of disks attached to a few specialized I/O nodes in the massively parallel processor (MPP). This arrangement requires programmers to give the file system many hints on how their data is to be laid out on disk if they want to achieve good performance. Additionally, the custom interface makes massively parallel file systems hard for programmers to use and difficult to seamlessly integrate into an environment with workstations and tertiary storage. The RAMA file system addresses these problems by providing a massively parallel file system that does not need user hints to provide good performance. RAMA takes advantage of the recent decrease in physical disk size by assuming that each processor in an MPP has one or more disks attached to it. Hashing is then used to pseudo-randomly distribute data to all of these disks, insuring high bandwidth regardless of access pattern. Since MPP programs often have many nodes accessing a single file in parallel, the file system must allow access to different parts of the file without relying on a particular node. In RAMA, a file request involves only two nodes -- the node making the request and the node on whose disk the data is stored. Thus, RAMA scales well to hundreds of processors. Since RAMA needs no layout hints from applications, it fits well into systems where users cannot (or will not) provide such hints. Fortunately, this flexibility does not cause a large loss of performance. RAMA's simulated performance is within 10-15% of the optimum performance of a similarly-sized striped file system, and is a factor of 4 or more better than a striped file system with poorly laid out data.

[1]  Randy H. Katz,et al.  Input/output behavior of supercomputing applications , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[2]  Phillip M. Dickens,et al.  A comparison of the architecture and performance of two parallel file systems , 1989 .

[3]  David S. Greenberg,et al.  Beyond core: Making parallel computer I/O practical , 1993 .

[4]  David Kotz,et al.  Disk-directed I/O for MIMD multiprocessors , 1994, OSDI '94.

[5]  R. S. Fabry,et al.  A fast file system for UNIX , 1984, TOCS.

[6]  Rakesh Krishnaiyer,et al.  PASSION: Parallel And Scalable Software for Input-Output , 1994 .

[7]  Carla Schlatter Ellis,et al.  Bridge: a high performance file system for parallel processors , 1988, [1988] Proceedings. The 8th International Conference on Distributed.

[8]  Charles H. Romine,et al.  $LU$ Factorization Algorithms on Distributed-Memory Multiprocessor Architectures , 1988 .

[9]  Dror G. Feitelson,et al.  Parallel access to files in the Vesta file system , 1993, Supercomputing '93. Proceedings.

[10]  Richard Wheeler,et al.  it/sfs: A Parallel File System for the CM-5 , 1993, USENIX Summer.

[11]  Randy H. Katz,et al.  An analytic performance model of disk arrays , 1993, SIGMETRICS '93.

[12]  Alan Jay Smith,et al.  Long term file migration: development and evaluation of algorithms , 1981, CACM.

[13]  Daniel A. Reed,et al.  File archive activity in a supercomputing environment , 1993, ICS '93.

[14]  Randy H. Katz,et al.  An Analysis of File Migration in a UNIX Supercomputing Environment , 1993, USENIX Winter.

[15]  D. G. Feitelson,et al.  Parallel access to files in the Vesta file system , 1993, Supercomputing '93.

[16]  Ethan L. Miller,et al.  Storage hierarchy management for scientific computing , 1996 .