TOFF-2: A high-performance fault-tolerant file service

Abstract TOFF-2 is a high-performance fault-tolerant file service featuring in a new Symmetric Primary Backup (SPB) replication model. This model lets all replicated servers in a service share the load of a traditional primary server, and minimizes the communication overhead between servers. TOFF-2 is also totally transparent to the client machines, and any host with an NFS client implementation can use the fault-tolerant service provided by TOFF-2 without any modification. The clients do not have to know anything about replication and server failures, since the TOFF-2 service cluster looks exactly like a single server over the network. The concept and design of TOFF-2 is introduced in this article, and statistics taken from tests on a prototype implementation show promising results.

[1]  Mendel Rosenblum,et al.  The design and implementation of a log-structured file system , 1991, SOSP '91.

[2]  Kenneth P. Birman,et al.  Position Paper - Deceit: A Flexible Distributed File System , 1990, Workshop on the Management of Replicated Data.

[3]  Michael Williams,et al.  Replication in the harp file system , 1991, SOSP '91.

[4]  Shang-Rong Tsai,et al.  Transparency in a replicated network file system , 1996, Proceedings of EUROMICRO 96. 22nd Euromicro Conference. Beyond 2000: Hardware and Software Design Strategies.

[5]  Tzi-cker Chiueh Trail: a track-based logging disk architecture for zero-overhead writes , 1993, Proceedings of 1993 IEEE International Conference on Computer Design ICCD'93.

[6]  Stephen E. Deering,et al.  Host extensions for IP multicasting , 1986, RFC.

[7]  John H. Hartman,et al.  The Zebra striped network file system , 1995, TOCS.

[8]  Dan Walsh,et al.  Design and implementation of the Sun network filesystem , 1985, USENIX Conference Proceedings.

[9]  Anupam Bhide,et al.  A Highly Available Network File Server , 1991, USENIX Winter.

[10]  Kenneth P. Birman,et al.  Deceit: a flexible distributed file system , 1990, [1990] Proceedings. Workshop on the Management of Replicated Data.

[11]  Shang-Rong Tsai,et al.  A fault tolerant RPC mechanism based on IP multicasting , 1997, J. Syst. Archit..

[12]  Gagan Agrawal,et al.  Coding-Based Replication Schemes for Distributed Systems , 1995, IEEE Trans. Parallel Distributed Syst..