The Chubby lock service for loosely-coupled distributed systems

We describe our experiences with the Chubby lock service, which is intended to provide coarse-grained locking as well as reliable (though low-volume) storage for a loosely-coupled distributed system. Chubby provides an interface much like a distributed file system with advisory locks, but the design emphasis is on availability and reliability, as opposed to high performance. Many instances of the service have been used for over a year, with several of them each handling a few tens of thousands of clients concurrently. The paper describes the initial design and expected use, compares it with actual use, and explains how the design had to be modified to accommodate the differences.

[1]  Nancy A. Lynch,et al.  Impossibility of distributed consensus with one faulty process , 1983, PODS '83.

[2]  David R. Jefferson,et al.  Virtual time , 1985, ICPP.

[3]  Kenneth P. Birman,et al.  Exploiting virtual synchrony in distributed systems , 1987, SOSP '87.

[4]  Mahadev Satyanarayanan,et al.  Scale and performance in a distributed file system , 1987, SOSP '87.

[5]  Scale and performance in a distributed file system , 1988, TOCS.

[6]  David R. Cheriton,et al.  Leases: an efficient fault-tolerant mechanism for distributed file cache consistency , 1989, SOSP '89.

[7]  Garret Swart,et al.  A coherent distributed file cache with directory write-behind , 1994, TOCS.

[8]  Butler W. Lampson,et al.  How to Build a Highly Available System Using Consensus , 1996, WDAG.

[9]  Paul R. McJones,et al.  Evolving the UNIX System Interface to Support Multithreaded Programs , 1997 .

[10]  Leslie Lamport,et al.  The part-time parliament , 1998, TOCS.

[11]  Sheng Liang,et al.  Java Native Interface: Programmer's Guide and Reference , 1999 .

[12]  Barbara Liskov,et al.  Viewstamped Replication: A New Primary Copy Method to Support Highly-Available Distributed Systems , 1999, PODC '88.

[13]  Leslie Lamport,et al.  Paxos Made Simple , 2001 .

[14]  Arun Venkataramani,et al.  Separating agreement from execution for byzantine fault tolerant services , 2003, SOSP '03.

[15]  GhemawatSanjay,et al.  The Google file system , 2003 .

[16]  Marc Najork,et al.  Boxwood: Abstractions as the Foundation for Storage Infrastructure , 2004, OSDI.

[17]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[18]  Michael Burrows,et al.  The Chubby Lock Service for Loosely-Coupled Distributed Systems , 2006, OSDI.

[19]  Robert Griesemer,et al.  Paxos made live: an engineering perspective , 2007, PODC '07.

[20]  Sanjay Ghemawat,et al.  MapReduce: simplified data processing on large clusters , 2008, CACM.