论文信息 - ZooKeeper: Wait-free Coordination for Internet-scale Systems - 字舞流文

ZooKeeper: Wait-free Coordination for Internet-scale Systems

In this paper, we describe ZooKeeper, a service for coordinating processes of distributed applications. Since ZooKeeper is part of critical infrastructure, ZooKeeper aims to provide a simple and high performance kernel for building more complex coordination primitives at the client. It incorporates elements from group messaging, shared registers, and distributed lock services in a replicated, centralized service. The interface exposed by Zoo-Keeper has the wait-free aspects of shared registers with an event-driven mechanism similar to cache invalidations of distributed file systems to provide a simple, yet powerful coordination service. The ZooKeeper interface enables a high-performance service implementation. In addition to the wait-free property, ZooKeeper provides a per client guarantee of FIFO execution of requests and linearizability for all requests that change the ZooKeeper state. These design decisions enable the implementation of a high performance processing pipeline with read requests being satisfied by local servers. We show for the target workloads, 2:1 to 100:1 read to write ratio, that ZooKeeper can handle tens to hundreds of thousands of transactions per second. This performance allows ZooKeeper to be used extensively by client applications.

Mahadev Konar | Benjamin Reed | Flavio Paiva Junqueira | Patrick Hunt | Patrick Hunt | M. Konar | F. Junqueira | B. Reed | P. Hunt

[1] Kenneth P. Birman,et al. Replication and fault-tolerance in the ISIS system , 1985, SOSP '85.

[2] Nancy P. Kronenberg,et al. VAXclusters (extended abstract): a closely-coupled distributed system , 1985, SOSP 1985.

[3] Mahadev Satyanarayanan,et al. Scale and performance in a distributed file system , 1987, SOSP '87.

[4] Maurice Herlihy,et al. Linearizability: a correctness condition for concurrent objects , 1990, TOPL.

[5] Andrew B. Hastings,et al. Distributed lock management in a transaction processing environment , 1990, Proceedings Ninth Symposium on Reliable Distributed Systems.

[6] Fred B. Schneider,et al. Implementing fault-tolerant services using the state machine approach: a tutorial , 1990, CSUR.

[7] Maurice Herlihy,et al. Wait-free synchronization , 1991, TOPL.

[8] Sape J. Mullender,et al. Distributed systems (2nd Ed.) , 1993 .

[9] Sape J. Mullender. Distributed Systems (2nd edition) , 1993 .

[10] Louise E. Moser,et al. The Totem system , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[11] Dennis Shasha,et al. The dangers of replication and a solution , 1996, SIGMOD '96.

[12] Robbert van Renesse,et al. Horus: a flexible group communication system , 1996, CACM.

[13] Robbert van Renesse,et al. Building adaptive systems using ensemble , 1998 .

[14] Leslie Lamport,et al. The part-time parliament , 1998, TOCS.

[15] Miguel Castro,et al. Practical byzantine fault tolerance and proactive recovery , 2002, TOCS.

[16] Marc Najork,et al. Boxwood: Abstractions as the Foundation for Storage Infrastructure , 2004, OSDI.

[17] Michael K. Reiter,et al. Fault-scalable Byzantine fault-tolerant services , 2005, SOSP '05.

[18] Joel Wein,et al. ACMS: the Akamai configuration management system , 2005, NSDI.

[19] Liuba Shrira,et al. HQ replication: a hybrid quorum protocol for byzantine fault tolerance , 2006, OSDI '06.

[20] Brett D. Fleisch,et al. The Chubby lock service for loosely-coupled distributed systems , 2006, OSDI '06.

[21] Robert Griesemer,et al. Paxos made live: an engineering perspective , 2007, PODC '07.

[22] Werner Vogels,et al. Dynamo: amazon's highly available key-value store , 2007, SOSP.

[23] Marcos K. Aguilera,et al. Sinfonia: a new paradigm for building scalable distributed systems , 2007, SOSP.

[24] Sam Toueg,et al. A robust and lightweight stable leader election service for dynamic systems , 2008, 2008 IEEE International Conference on Dependable Systems and Networks With FTCS and DCC (DSN).

[25] Ramakrishna Kotla,et al. Zyzzyva , 2007, SOSP.

[26] Miguel Correia,et al. DepSpace: a byzantine fault-tolerant coordination service , 2008, Eurosys '08.

[27] Benjamin Reed,et al. A simple totally ordered broadcast protocol , 2008, LADIS '08.

[28] Sangmin Lee,et al. Upright cluster services , 2009, SOSP '09.

[29] Petr Kuznetsov,et al. Zeno: Eventually Consistent Byzantine-Fault Tolerance , 2009, NSDI.