Centrifuge: Integrated Lease Management and Partitioning for Cloud Services

Making cloud services responsive is critical to providing a compelling user experience. Many large-scale sites, including LinkedIn, Digg and Facebook, address this need by deploying pools of servers that operate purely on in-memory state. Unfortunately, current technologies for partitioning requests across these in-memory server pools, such as network load balancers, lead to a frustrating programming model where requests for the same state may arrive at different servers. Leases are a well-known technique that can provide a better programming model by assigning each piece of state to a single server. However, in-memory server pools host an extremely large number of items, and granting a lease per item requires fine-grained leasing that is not supported in prior datacenter lease managers. This paper presents Centrifuge, a datacenter lease manager that solves this problem by integrating partitioning and lease management. Centrifuge consists of a set of libraries linked in by the in-memory servers and a replicated state machine that assigns responsibility for data items (including leases) to these servers. Centrifuge has been implemented and deployed in production as part of Microsoft's Live Mesh, a large-scale commercial cloud service in continuous operation since April 2008. When cloud services within Mesh were built using Centrifuge, they required fewer lines of code and did not need to introduce their own subtle protocols for distributed consistency. As cloud services become ever more complicated, this kind of reduction in complexity is an increasingly urgent need.

[1]  Ben Y. Zhao,et al.  The Ninja architecture for robust Internet-scale systems and services , 2001, Comput. Networks.

[2]  Brett D. Fleisch,et al.  The Chubby lock service for loosely-coupled distributed systems , 2006, OSDI '06.

[3]  Michael Dahlin,et al.  Cooperative caching: using remote client memory to improve file system performance , 1994, OSDI '94.

[4]  Armando Fox,et al.  Session State: Beyond Soft State , 2004, NSDI.

[5]  Marcos K. Aguilera,et al.  Sinfonia: a new paradigm for building scalable distributed systems , 2007, SOSP.

[6]  Eric A. Brewer,et al.  Cluster-based scalable network services , 1997, SOSP.

[7]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.

[8]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[9]  Michael Burrows,et al.  The Chubby Lock Service for Loosely-Coupled Distributed Systems , 2006, OSDI.

[10]  Antony I. T. Rowstron,et al.  Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems , 2001, Middleware.

[11]  Jeffrey F. Naughton,et al.  Middle-tier database caching for e-business , 2002, SIGMOD '02.

[12]  Harrick M. Vin,et al.  A fault-tolerant java virtual machine , 2003, 2003 International Conference on Dependable Systems and Networks, 2003. Proceedings..

[13]  David R. Karger,et al.  Wide-area cooperative storage with CFS , 2001, SOSP.

[14]  David E. Culler,et al.  Distributed data structures for internet service construction , 2000, USENIX Symposium on Operating Systems Design and Implementation.

[15]  Erich M. Nahum,et al.  Locality-aware request distribution in cluster-based network servers , 1998, ASPLOS VIII.

[16]  John Kubiatowicz,et al.  Handling churn in a DHT , 2004 .

[17]  Barbara Liskov,et al.  Practical uses of synchronized clocks in distributed systems , 1991, PODC '91.

[18]  David R. Cheriton,et al.  Leases: an efficient fault-tolerant mechanism for distributed file cache consistency , 1989, SOSP '89.

[19]  Michael Dahlin,et al.  Using leases to support server-driven consistency in large-scale systems , 1998, Proceedings. 18th International Conference on Distributed Computing Systems (Cat. No.98CB36183).

[20]  Marc Najork,et al.  Boxwood: Abstractions as the Foundation for Storage Infrastructure , 2004, OSDI.

[21]  Armando Fox,et al.  The Case for a Session State Storage Layer , 2003, HotOS.

[22]  Antony I. T. Rowstron,et al.  The IceCube approach to the reconciliation of divergent replicas , 2001, PODC '01.

[23]  Chandramohan A. Thekkath,et al.  Frangipani: a scalable distributed file system , 1997, SOSP.

[24]  Michael Isard,et al.  Autopilot: automatic data center management , 2007, OPSR.

[25]  Anna R. Karlin,et al.  Implementing global memory management in a workstation cluster , 1995, SOSP.

[26]  Eugene Ciurana,et al.  Google App Engine , 2009 .

[27]  Mark Handley,et al.  A scalable content-addressable network , 2001, SIGCOMM '01.

[28]  Robert Griesemer,et al.  Paxos made live: an engineering perspective , 2007, PODC '07.

[29]  Marvin Theimer,et al.  Managing update conflicts in Bayou, a weakly connected replicated storage system , 1995, SOSP.

[30]  Priya Narasimhan,et al.  Static Analysis Meets Distributed Fault-Tolerance: Enabling State-Machine Replication with Nondeterminism , 2006, HotDep.

[31]  David E. Culler,et al.  Scalable, distributed data structures for internet service construction , 2000, OSDI.

[32]  Yasushi Saito,et al.  Optimistic replication , 2005, CSUR.

[33]  David R. Karger,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM '01.

[34]  Michael Dahlin,et al.  Volume Leases for Consistency in Large-Scale Systems , 1999, IEEE Trans. Knowl. Data Eng..