Dynamo: amazon's highly available key-value store

Reliability at massive scale is one of the biggest challenges we face at Amazon.com, one of the largest e-commerce operations in the world; even the slightest outage has significant financial consequences and impacts customer trust. The Amazon.com platform, which provides services for many web sites worldwide, is implemented on top of an infrastructure of tens of thousands of servers and network components located in many datacenters around the world. At this scale, small and large components fail continuously and the way persistent state is managed in the face of these failures drives the reliability and scalability of the software systems. This paper presents the design and implementation of Dynamo, a highly available key-value storage system that some of Amazon's core services use to provide an "always-on" experience. To achieve this level of availability, Dynamo sacrifices consistency under certain failure scenarios. It makes extensive use of object versioning and application-assisted conflict resolution in a manner that provides a novel interface for developers to use.

[1]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[2]  Robert H. Thomas,et al.  A Majority consensus approach to concurrency control for multiple copy databases , 1979, ACM Trans. Database Syst..

[3]  Philip A. Bernstein,et al.  An algorithm for concurrency control and recovery in replicated distributed databases , 1984, TODS.

[4]  Ralph C. Merkle,et al.  A Digital Signature Based on a Conventional Encryption Function , 1987, CRYPTO.

[5]  John S. Heidemann,et al.  Resolving File Conflicts in the Ficus File System , 1994, USENIX Summer.

[6]  Marvin Theimer,et al.  Managing update conflicts in Bayou, a weakly connected replicated storage system , 1995, SOSP.

[7]  Dennis Shasha,et al.  The dangers of replication and a solution , 1996, SIGMOD '96.

[8]  David R. Karger,et al.  Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web , 1997, STOC '97.

[9]  Eric A. Brewer,et al.  Cluster-based scalable network services , 1997, SOSP.

[10]  William J. Bolosky,et al.  Progress-based regulation of low-importance processes , 1999, SOSP.

[11]  Ben Y. Zhao,et al.  OceanStore: an architecture for global-scale persistent storage , 2000, SIGP.

[12]  John R. Douceur,et al.  Process-based regulation of low-importance processes , 2000, OPSR.

[13]  Antony I. T. Rowstron,et al.  Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility , 2001, SOSP.

[14]  Indranil Gupta,et al.  On scalable and efficient distributed failure detectors , 2001, PODC '01.

[15]  Antony I. T. Rowstron,et al.  Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems , 2001, Middleware.

[16]  David R. Karger,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM '01.

[17]  David E. Culler,et al.  SEDA: an architecture for well-conditioned, scalable internet services , 2001, SOSP.

[18]  Miguel Castro,et al.  Farsite: federated, available, and reliable storage for an incompletely trusted environment , 2002, OPSR.

[19]  GhemawatSanjay,et al.  The Google file system , 2003 .

[20]  Arif Merchant,et al.  FAB: building distributed enterprise disk arrays from commodity components , 2004, ASPLOS XI.

[21]  Emin Gün Sirer,et al.  Beehive: O(1) Lookup Performance for Power-Law Query Distributions in Peer-to-Peer Overlays , 2004, NSDI.

[22]  John Kubiatowicz,et al.  Antiquity: exploiting a secure log for wide-area distributed storage , 2007, EuroSys '07.

[23]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.