Light-Weight Leases for Storage-Centric Coordination

Reaching agreement among processes sharing read/write memory is possible only in the presence of an eventual unique leader. A leader that fails must be recoverable, but on the other hand, a live and well-performing leader should never be decrowned. This paper presents the first leader algorithm in shared memory environments that guarantees an eventual leader following global stabilization time. The construction is built using light-weight lease and renew primitives. The implementation is simple, yet efficient. It is uniform, in the sense that the number of potentially contending processes for leadership is not a priori known.

[1]  Flaviu Cristian,et al.  The Timed Asynchronous Distributed System Model , 1998, IEEE Trans. Parallel Distributed Syst..

[2]  Flaviu Cristian,et al.  The Timed Asynchronous Distributed System Model , 1999, IEEE Trans. Parallel Distributed Syst..

[3]  Rachid Guerraoui,et al.  Asynchronous leasing , 2002, Proceedings of the Seventh IEEE International Workshop on Object-Oriented Real-Time Dependable Systems. (WORDS 2002).

[4]  Frank B. Schmuck,et al.  GPFS: A Shared-Disk File System for Large Computing Clusters , 2002, FAST.

[5]  Leslie Lamport,et al.  A fast mutual exclusion algorithm , 1987, TOCS.

[6]  AbadiMartín,et al.  An old-fashioned recipe for real time , 1994 .

[7]  Robert M. Rees,et al.  IBM Storage Tank - A heterogeneous scalable SAN file system , 2003, IBM Syst. J..

[8]  Joseph S. Glider,et al.  The software architecture of a SAN storage control system , 2003, IBM Syst. J..

[9]  Matthew T. O'Keefe,et al.  An Overview of Version 0.9.5 Proposed SCSI Device Locks , 2000, IEEE Symposium on Mass Storage Systems.

[10]  Garth A. Gibson,et al.  Highly concurrent shared storage , 2000, Proceedings 20th IEEE International Conference on Distributed Computing Systems.

[11]  Sam Toueg,et al.  Unreliable failure detectors for reliable distributed systems , 1996, JACM.

[12]  Sergio Rajsbaum ACM SIGACT news distributed computing column 5 , 2001, SIGA.

[13]  Hagit Attiya,et al.  Sharing memory with semi-Byzantine clients and faulty storage servers , 2003, 22nd International Symposium on Reliable Distributed Systems, 2003. Proceedings..

[14]  Butler W. Lampson,et al.  How to Build a Highly Available System Using Consensus , 1996, WDAG.

[15]  N. Lynch,et al.  Timing-based mutual exclusion , 1992, [1992] Proceedings Real-Time Systems Symposium.

[16]  Rajeev Alur,et al.  Time-Adaptive Algorithms for Synchronization , 1997, SIAM J. Comput..

[17]  Michael K. Reiter,et al.  Backoff protocols for distributed mutual exclusion and ordering , 2001, Proceedings 21st International Conference on Distributed Computing Systems.

[18]  Leslie Lamport,et al.  Disk Paxos , 2003, Distributed Computing.

[19]  Sam Toueg,et al.  Fault-tolerant wait-free shared objects , 1992, Proceedings., 33rd Annual Symposium on Foundations of Computer Science.

[20]  Nancy A. Lynch,et al.  A Framework for Modeling Timed Systems with Restricted Hybrid Automata , 2003, RTSS 2003.

[21]  Nancy A. Lynch,et al.  Distributed Algorithms , 1992, Lecture Notes in Computer Science.

[22]  Grant Erickson,et al.  A 64-bit, shared disk file system for Linux , 1999, 16th IEEE Symposium on Mass Storage Systems in cooperation with the 7th NASA Goddard Conference on Mass Storage Systems and Technologies (Cat. No.99CB37098).

[23]  Ohad Rodeh,et al.  zFS - a scalable distributed file system using object disks , 2003, 20th IEEE/11th NASA Goddard Conference on Mass Storage Systems and Technologies, 2003. (MSST 2003). Proceedings..

[24]  Richard A. Golding Group communication — still complex after all these years , 2003 .

[25]  Rajeev Alur,et al.  How to share a data structure: A fast timing-based solution∗ , 2006 .

[26]  Grant Erickson,et al.  The Design and Performance of a Shared Disk File System for IRIX , 1998 .

[27]  Kurt Jensen,et al.  Mutual Exclusion Algorithm , 1997 .

[28]  Matthew T. O'Keefe,et al.  Device Locks: mutual exclusion for storage area networks , 1999, 16th IEEE Symposium on Mass Storage Systems in cooperation with the 7th NASA Goddard Conference on Mass Storage Systems and Technologies (Cat. No.99CB37098).

[29]  Leslie Lamport,et al.  Paxos Made Simple , 2001 .

[30]  Matthew T. O'Keefe,et al.  The Global File System , 1996 .

[31]  Rajeev Alur,et al.  Fast timing-based algorithms , 1996, Distributed Computing.

[32]  Rajeev Alur,et al.  Contention-Free Complexity of Shared Memory Algorithms , 1996, Inf. Comput..

[33]  Jennifer L. Welch,et al.  Multi-writer Consistency Conditions for Shared Memory Objects , 2003, DISC.

[34]  Nancy A. Lynch,et al.  Bounds on Shared Memory for Mutual Exclusion , 1993, Inf. Comput..

[35]  Eli Gafni,et al.  Analysis of timing-based mutual exclusion with random times , 1999, PODC '99.

[36]  Nancy A. Lynch,et al.  Consensus in the presence of partial synchrony , 1988, JACM.

[37]  Dahlia Malkhi,et al.  Active Disk Paxos with infinitely many processes , 2002, PODC '02.

[38]  Vassos Hadzilacos,et al.  Using Failure Detectors to Solve Consensus in Asynchronous Sharde-Memory Systems (Extended Abstract) , 1994, WDAG.

[39]  Darrell D. E. Long,et al.  Data management in a distributed file system for storage area networks , 2000 .

[40]  David R. Cheriton,et al.  Leases: an efficient fault-tolerant mechanism for distributed file cache consistency , 1989, SOSP '89.

[41]  Butler W. Lampson,et al.  The ABCD's of Paxos , 2001, PODC '01.