Safe caching in a distributed file system for network attached storage

In a distributed file system built on network attached storage, client computers access data directly from shared storage, rather than submitting I/O requests through a server. Without a server marshaling access to data, if a computer fails or becomes isolated in a network partition while holding locks on cached data objects, those objects become inaccessible to other computers until a locking authority can guarantee that the lock holder will not again directly access these data. We describe a server that acts as the locking authority and implements a lease-based protocol for revoking access to data objects locked by an isolated or failed computer. When a lease expires, the server can be assured that the client no longer acts on locked data, and can safely redistribute locks to other clients. During normal operation, this protocol invokes no message overhead, and uses no memory and performs no computation at the locking authority.

[1]  Jim Zelenka,et al.  File server scaling with network-attached secure disks , 1997, SIGMETRICS '97.

[2]  Alan F. Benner Fibre Channel: Gigabit Communications and I/O for Computer Networks , 1995 .

[3]  Grant Erickson,et al.  A 64-bit, shared disk file system for Linux , 1999, 16th IEEE Symposium on Mass Storage Systems in cooperation with the 7th NASA Goddard Conference on Mass Storage Systems and Technologies (Cat. No.99CB37098).

[4]  Ragunathan Rajkumar,et al.  Processor group membership protocols: specification, design and implementation , 1993, Proceedings of 1993 IEEE 12th Symposium on Reliable Distributed Systems.

[5]  David A. Patterson,et al.  A case for intelligent disks (IDISKs) , 1998, SGMD.

[6]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[7]  Murthy V. Devarakonda,et al.  Recovery in the Calypso file system , 1996, TOCS.

[8]  John H. Hartman,et al.  The Zebra striped network file system , 1995, TOCS.

[9]  David R. Cheriton,et al.  Leases: an efficient fault-tolerant mechanism for distributed file cache consistency , 1989, SOSP '89.

[10]  Chandramohan A. Thekkath,et al.  Frangipani: a scalable distributed file system , 1997, SOSP.

[11]  LamportLeslie Time, clocks, and the ordering of events in a distributed system , 1978 .

[12]  John S. Heidemann,et al.  Implementation of the Ficus Replicated File System , 1990, USENIX Summer.

[13]  Özalp Babaoglu,et al.  RELACS: A communications infrastructure for constructing reliable applications in large-scale distributed systems , 1995, Proceedings of the Twenty-Eighth Annual Hawaii International Conference on System Sciences.

[14]  Kenneth P. Birman,et al.  Deceit: a flexible distributed file system , 1990, [1990] Proceedings. Workshop on the Management of Replicated Data.

[15]  Michael L. Kazar,et al.  Synchronization and Caching Issues in the Andrew File System , 1988, USENIX Winter.

[16]  Howard Frazier,et al.  Gigabit Ethernet: From 100 to 1000 Mbps , 1999, IEEE Internet Comput..

[17]  Mahadev Satyanarayanan,et al.  Andrew: a distributed personal computing environment , 1986, CACM.

[18]  Jim Zelenka,et al.  Filesystems for Network-Attached Secure Disks (CMU-CS-97-118) , 1997 .

[19]  Jeanna Neefe Matthews,et al.  Serverless network file systems , 1996, TOCS.

[20]  Mahadev Satyanarayanan,et al.  Coda: A Highly Available File System for a Distributed Workstation Environment , 1990, IEEE Trans. Computers.

[21]  Leslie Lamport,et al.  How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs , 2016, IEEE Transactions on Computers.

[22]  Darrell D. E. Long,et al.  Swift: Using Distributed Disk Striping to Provide High I/O Data Rates , 1991, Comput. Syst..

[23]  Joseph Pasquale,et al.  A high performance multi-structured file system design , 1991, SOSP '91.

[24]  Hai Jin,et al.  Active Disks: Programming Model, Algorithms and Evaluation , 2002 .

[25]  Bruce Walker,et al.  The LOCUS distributed operating system , 1983, SOSP '83.

[26]  Cynthia Dwork,et al.  Collective Consistency , 1996, WDAG.

[27]  Anoop Gupta,et al.  Parallel computer architecture - a hardware / software approach , 1998 .

[28]  Mary Baker Fast crash recovery in distributed file systems , 1994 .

[29]  Christos Faloutsos,et al.  Active Storage for Large-Scale Data Mining and Multimedia , 1998, VLDB.

[30]  Leslie Lamport,et al.  The Byzantine Generals Problem , 1982, TOPL.

[31]  Sailesh Chutani,et al.  DEcorum File System Architectural Overview , 1990, USENIX Summer.

[32]  Kenneth P. Birman Replication and fault-tolerance in the ISIS system , 1985, SOSP 1985.