Design and evaluation of fault-tolerant shared file system for cluster systems

The paper describes the design and evaluation of a Fault Tolerant Shared File System (FTSFS) architecture for cluster systems with shared disks. The FTSFS architecture: guarantees no file system (FS) structure crashes on processor/program failure; can be applied to any existing nonshared FS without changing the structure of the FS; and does not degrade performance on the shared FS compared with a standard non shared FS. Using the FTSFS architecture, we implemented a fault tolerant shared FS on Fujitsu's SVR4 duplex system, and evaluated the system performance. The evaluation showed that the shared FS is competitive in performance with the standard SVR4-UFS (Unix File System).

[1]  Anupam Bhide,et al.  A Highly Available Network File Server , 1991, USENIX Winter.

[2]  William I. Nowicki,et al.  NFS: Network File System Protocol specification , 1989, RFC.

[3]  Carl Staelin,et al.  An Implementation of a Log-Structured File System for UNIX , 1993, USENIX Winter.

[4]  Randy H. Katz,et al.  A case for redundant arrays of inexpensive disks (RAID) , 1988, SIGMOD '88.

[5]  John H. Hartman,et al.  The Zebra striped network file system , 1995, TOCS.

[6]  Irving L. Traiger,et al.  The notions of consistency and predicate locks in a database system , 1976, CACM.

[7]  Raj Srinivasan,et al.  RPC: Remote Procedure Call Protocol Specification Version 2 , 1995, RFC.

[8]  Garth A. Gibson Redundant disk arrays: Reliable, parallel secondary storage. Ph.D. Thesis , 1990 .

[9]  John H. Hartman,et al.  Zebra: A Striped Network File System , 1992 .

[10]  Garth A. Gibson,et al.  RAID: high-performance, reliable secondary storage , 1994, CSUR.

[11]  Darrell D. E. Long,et al.  Swift: Using Distributed Disk Striping to Provide High I/O Data Rates , 1991, Comput. Syst..

[12]  Roy Billinton,et al.  Reliability evaluation of engineering systems : concepts and techniques , 1992 .

[13]  Steven A. Moyer,et al.  PIOUS: a scalable parallel I/O system for distributed computing environments , 1994, Proceedings of IEEE Scalable High Performance Computing Conference.

[14]  Mendel Rosenblum,et al.  The design and implementation of a log-structured file system , 1991, SOSP '91.

[15]  David J. DeWitt,et al.  Chained declustering: a new availability strategy for multiprocessor database machines , 1990, [1990] Proceedings. Sixth International Conference on Data Engineering.

[16]  Michael O. Rabin,et al.  Efficient dispersal of information for security, load balancing, and fault tolerance , 1989, JACM.

[17]  John Wilkes,et al.  UNIX Disk Access Patterns , 1993, USENIX Winter.

[18]  David Kotz,et al.  Dynamic file-access characteristics of a production parallel scientific workload , 1994, Proceedings of Supercomputing '94.

[19]  Gordon F. Newell,et al.  A Single Server , 1979 .

[20]  Vincent Hodgson,et al.  The Single Server Queue. , 1972 .

[21]  Harshinder Singh,et al.  On the probability that the kth customer finds an M/M/1 queue empty , 1992, Advances in Applied Probability.

[22]  Chandramohan A. Thekkath,et al.  Petal: distributed virtual disks , 1996, ASPLOS VII.

[23]  Dan Walsh,et al.  Design and implementation of the Sun network filesystem , 1985, USENIX Conference Proceedings.

[24]  James S. Plank,et al.  A tutorial on Reed–Solomon coding for fault‐tolerance in RAID‐like systems , 1997, Softw. Pract. Exp..

[25]  A. Chervenak,et al.  Protecting File Systems : A Survey of Backup Techniques , 1998 .

[26]  Walter A. Burkhard,et al.  Disk array storage system reliability , 1993, FTCS-23 The Twenty-Third International Symposium on Fault-Tolerant Computing.

[27]  Garth Alan Gibson Redundant disk arrays: Reliable, parallel secondary storage. Ph.D. Thesis , 1990 .

[28]  H KatzRandy,et al.  A case for redundant arrays of inexpensive disks (RAID) , 1988 .

[29]  Wilson C. Hsieh,et al.  LOGICAL DISK: A SIMPLE NEW APPROACH TO IMPROVING FILE SYSTEM PERFORMANCE , 1993 .

[30]  Robert Ross,et al.  Implementation and performance of a parallel file system for high performance distributed applications , 1996, Proceedings of 5th IEEE International Symposium on High Performance Distributed Computing.

[31]  Joseph D. Touch,et al.  Performance analysis of MD5 , 1995, SIGCOMM '95.