Amazon Aurora: Design Considerations for High Throughput Cloud-Native Relational Databases

Amazon Aurora is a relational database service for OLTP workloads offered as part of Amazon Web Services (AWS). In this paper, we describe the architecture of Aurora and the design considerations leading to that architecture. We believe the central constraint in high throughput data processing has moved from compute and storage to the network. Aurora brings a novel architecture to the relational database to address this constraint, most notably by pushing redo processing to a multi-tenant scale-out storage service, purpose-built for Aurora. We describe how doing so not only reduces network traffic, but also allows for fast crash recovery, failovers to replicas without loss of data, and fault-tolerant, self-healing storage. We then describe how Aurora achieves consensus on durable state across numerous storage nodes using an efficient asynchronous scheme, avoiding expensive and chatty recovery protocols. Finally, having operated Aurora as a production service for over 18 months, we share the lessons we have learnt from our customers on what modern cloud applications expect from databases.

[1]  Viktor Leis,et al.  Exploiting hardware transactional memory in main-memory databases , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[2]  Sudipta Sengupta,et al.  The Bw-Tree: A B-tree for new hardware platforms , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[3]  Hamid Pirahesh,et al.  ARIES: a transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging , 1998 .

[4]  Marcos K. Aguilera,et al.  Sinfonia: a new paradigm for building scalable distributed systems , 2007, SOSP.

[5]  Hans-Arno Jacobsen,et al.  PNUTS: Yahoo!'s hosted data serving platform , 2008, Proc. VLDB Endow..

[6]  Goetz Graefe,et al.  Instant recovery for data center savings , 2015, SGMD.

[7]  Cheng Huang,et al.  Rethinking erasure codes for cloud file systems: minimizing I/O for recovery and degraded reads , 2012, FAST.

[8]  FeketeAlan,et al.  Scalable Atomic Visibility with RAMP Transactions , 2016 .

[9]  Christopher Frost,et al.  Spanner: Google's Globally-Distributed Database , 2012, OSDI.

[10]  Robbert van Renesse,et al.  Chain Replication for Supporting High Throughput and Availability , 2004, OSDI.

[11]  Ali Ghodsi,et al.  Scalable atomic visibility with RAMP transactions , 2014, SIGMOD Conference.

[12]  Jignesh M. Patel,et al.  High-Performance Concurrency Control Mechanisms for Main-Memory Databases , 2011, Proc. VLDB Endow..

[13]  Irving L. Traiger,et al.  Granularity of locks in a shared data base , 1975, VLDB '75.

[14]  Nancy A. Lynch,et al.  Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services , 2002, SIGA.

[15]  Bruce G. Lindsay,et al.  Efficient commit protocols for the tree of processes model of distributed transactions , 1985, OPSR.

[16]  Michael Stonebraker,et al.  E-Store: Fine-Grained Elastic Partitioning for Distributed Transaction Processing , 2014, Proc. VLDB Endow..

[17]  Philip A. Bernstein,et al.  Rethinking eventual consistency , 2013, SIGMOD '13.

[18]  Marcos K. Aguilera,et al.  Yesquel: scalable sql storage for web applications , 2014, SOSP.

[19]  Ali Ghodsi,et al.  Eventual consistency today: limitations, extensions, and beyond , 2013, CACM.

[20]  Ali Ghodsi,et al.  Highly Available Transactions: Virtues and Limitations , 2013, Proc. VLDB Endow..

[21]  David Kenneth Gifford,et al.  Information storage in a decentralized computer system , 1981 .

[22]  Barbara Liskov,et al.  Weak Consistency: A Generalized Theory and Optimistic Implementations for Distributed Transactions , 1999 .

[23]  Philip A. Bernstein,et al.  Hyder - A Transactional Record Manager for Shared Flash , 2011, CIDR.

[24]  Michael Stonebraker,et al.  The VoltDB Main Memory DBMS , 2013, IEEE Data Eng. Bull..

[25]  Mendel Rosenblum,et al.  The design and implementation of a log-structured file system , 1991, SOSP '91.

[26]  Ju Wang,et al.  Windows Azure Storage: a highly available cloud storage service with strong consistency , 2011, SOSP.

[27]  Daniel J. Abadi,et al.  Consistency Tradeoffs in Modern Distributed Database System Design: CAP is Only Part of the Story , 2012, Computer.

[28]  David K. Gifford,et al.  Weighted voting for replicated data , 1979, SOSP '79.

[29]  Hector Garcia-Molina,et al.  Consistency in a partitioned network: a survey , 1985, CSUR.

[30]  Alfons Kemper,et al.  An Evaluation of Strict Timestamp Ordering Concurrency Control for Main-Memory Database Systems , 2013, IMDM@VLDB.

[31]  Sanjay Ghemawat,et al.  MapReduce: a flexible data processing tool , 2010, CACM.

[32]  Sudipta Sengupta,et al.  LLAMA: A Cache/Storage Subsystem for Modern Hardware , 2013, Proc. VLDB Endow..

[33]  Rob Woollen The internal design of salesforce.com's multi-tenant architecture , 2010, SoCC '10.

[34]  Luiz André Barroso,et al.  The tail at scale , 2013, CACM.

[35]  Bruce G. Lindsay,et al.  Transaction management in the R* distributed database management system , 1986, TODS.

[36]  Sudipta Sengupta,et al.  High Performance Transactions in Deuteronomy , 2015, CIDR.

[37]  Yasushi Saito,et al.  Optimistic replication , 2005, CSUR.

[38]  Ali Ghodsi,et al.  Eventual Consistency Today: Limitations, Extensions, and Beyond , 2013 .

[39]  Jim Gray,et al.  A critique of ANSI SQL isolation levels , 1995, SIGMOD '95.