FoundationDB: A Distributed Unbundled Transactional Key Value Store

FoundationDB is an open source transactional key value store created more than ten years ago. It is one of the first systems to combine the flexibility and scalability of NoSQL architectures with the power of ACID transactions (a.k.a. NewSQL). FoundationDB adopts an unbundled architecture that decouples an in-memory transaction management system, a distributed storage system, and a built-in distributed configuration system. Each sub-system can be independently provisioned and configured to achieve the desired scalability, high-availability and fault tolerance properties. FoundationDB uniquely integrates a deterministic simulation framework, used to test every new feature of the system under a myriad of possible faults. This rigorous testing makes FoundationDB extremely stable and allows developers to introduce and release new features in a rapid cadence. FoundationDB offers a minimal and carefully chosen feature set, which has enabled a range of disparate systems (from semi-relational databases, document and object stores, to graph databases and more) to be built as layers on top. FoundationDB is the underpinning of cloud infrastructure at Apple, Snowflake and other companies, due to its consistency, robustness and availability for storing user data, system metadata and configuration, and other critical information.

[1]  Gul A. Agha,et al.  ACTORS - a model of concurrent computation in distributed systems , 1985, MIT Press series in artificial intelligence.

[2]  Andrew Pavlo,et al.  Write-Behind Logging , 2016, Proc. VLDB Endow..

[3]  Prashant Malik,et al.  Cassandra: structured storage system on a P2P network , 2009, PODC '09.

[4]  Mark Lillibridge,et al.  Torturing Databases for Fun and Profit , 2014, OSDI.

[5]  Alexander Shraer,et al.  QuiCK: A Queuing System in CloudKit , 2021, SIGMOD Conference.

[6]  Alex Groce,et al.  Swarm testing , 2012, ISSTA 2012.

[7]  Michael Stonebraker,et al.  The End of an Architectural Era (It's Time for a Complete Rewrite) , 2007, VLDB.

[8]  Donald R. Slutz,et al.  Massive Stochastic Testing of SQL , 1998, VLDB.

[9]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.

[10]  Alexander Shraer,et al.  FoundationDB Record Layer: A Multi-Tenant Structured Datastore , 2019, SIGMOD Conference.

[11]  Michael Walfish,et al.  Cobra: Making Transactional Key-Value Stores Verifiably Serializable , 2020, OSDI.

[12]  Hans-Arno Jacobsen,et al.  PNUTS: Yahoo!'s hosted data serving platform , 2008, Proc. VLDB Endow..

[13]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[14]  Anurag Gupta,et al.  Amazon Aurora: Design Considerations for High Throughput Cloud-Native Relational Databases , 2017, SIGMOD Conference.

[15]  Michael Stonebraker,et al.  Rethinking main memory OLTP recovery , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[16]  Alexander Shraer,et al.  CloudKit: Structured Storage for Mobile Applications , 2018, Proc. VLDB Endow..

[17]  Fernando Pedone,et al.  Sprint: a middleware for high-performance transaction processing , 2007, EuroSys '07.

[18]  Daniel Gómez Ferro,et al.  A critique of snapshot isolation , 2012, EuroSys '12.

[19]  Frank Dabek,et al.  Large-scale Incremental Processing Using Distributed Transactions and Notifications , 2010, OSDI.

[20]  Mohamed F. Mokbel,et al.  Deuteronomy: Transaction Support for Cloud Data , 2011, CIDR.

[21]  Pallavi Joshi,et al.  SAMC: Semantic-Aware Model Checking for Fast Discovery of Deep Bugs in Cloud Systems , 2014, OSDI.

[22]  Donald Kossmann,et al.  On the Design and Scalability of Distributed Shared-Data Databases , 2015, SIGMOD Conference.

[23]  Dahlia Malkhi,et al.  Active Disk Paxos with infinitely many processes , 2002, PODC '02.

[24]  Tao Zou,et al.  Tango: distributed data structures over a shared log , 2013, SOSP.

[25]  Philip A. Bernstein,et al.  Hyder - A Transactional Record Manager for Shared Flash , 2011, CIDR.

[26]  William Pugh,et al.  Skip Lists: A Probabilistic Alternative to Balanced Trees , 1989, WADS.

[27]  Gerhard Weikum,et al.  Unbundling Transaction Services in the Cloud , 2009, CIDR.

[28]  Sachin Katti,et al.  Copysets: Reducing the Frequency of Data Loss in Cloud Storage , 2013, USENIX Annual Technical Conference.

[29]  J. T. Robinson,et al.  On optimistic methods for concurrency control , 1979, TODS.

[30]  Andrea C. Arpaci-Dusseau,et al.  Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions , 2017, FAST.

[31]  Daniel J. Abadi,et al.  Calvin: fast distributed transactions for partitioned database systems , 2012, SIGMOD Conference.

[32]  Luiz André Barroso,et al.  The tail at scale , 2013, CACM.

[33]  Miguel Castro,et al.  FaRM: Fast Remote Memory , 2014, NSDI.

[34]  Idit Keidar,et al.  Omid, Reloaded: Scalable and Highly-Available Transaction Processing , 2017, FAST.

[35]  Zhendong Su,et al.  Testing Database Engines via Pivoted Query Synthesis , 2020, OSDI.

[36]  Wei Cao,et al.  PolarFS: An Ultra-low Latency and Failure Resilient Distributed File System for Shared Storage Cloud Database , 2018, Proc. VLDB Endow..

[37]  Anurag Gupta,et al.  Amazon Aurora: On Avoiding Distributed Consensus for I/Os, Commits, and Membership Changes , 2018, SIGMOD Conference.

[38]  Eddie Kohler,et al.  Speedy transactions in multicore in-memory databases , 2013, SOSP.

[39]  Donald Kossmann,et al.  Analytics in Motion: High Performance Event-Processing AND Real-Time Analytics in the Same Database , 2015, SIGMOD Conference.

[40]  Miguel Castro,et al.  No compromises: distributed transactions with consistency, availability, and performance , 2015, SOSP.

[41]  Idit Keidar,et al.  Ordering Transactions with Prediction in Distributed Object Stores , 2013 .

[42]  Feifei Li,et al.  Solar: Towards a Shared-Everything Database on Distributed Log-Structured Storage , 2018, USENIX Annual Technical Conference.

[43]  Irfan Sharif,et al.  CockroachDB: The Resilient Geo-Distributed SQL Database , 2020, SIGMOD Conference.

[44]  Samer Al-Kiswany,et al.  An Analysis of Network-Partitioning Failures in Cloud Systems , 2018, OSDI.

[45]  Leo Giakoumakis,et al.  A genetic approach for random testing of database systems , 2007, VLDB.

[46]  Idit Keidar,et al.  Taking Omid to the Clouds: Fast, Scalable Transactions for Real-Time Cloud Analytics , 2018, Proc. VLDB Endow..

[47]  C. Mohan,et al.  DB2's Use of the Coupling Facility for Data Sharing , 1997, IBM Syst. J..

[48]  Jude Cruise,et al.  SQLite , 2019, Python by Example.

[49]  Mohamed F. Mokbel,et al.  Locking Key Ranges with Unbundled Transaction Services , 2009, Proc. VLDB Endow..

[50]  Daniel J. Abadi,et al.  Consistency Tradeoffs in Modern Distributed Database System Design: CAP is Only Part of the Story , 2012, Computer.

[51]  Craig Freedman,et al.  Hekaton: SQL server's memory-optimized OLTP engine , 2013, SIGMOD '13.

[52]  Mendel Rosenblum,et al.  Fast crash recovery in RAMCloud , 2011, SOSP.

[53]  Hamid Pirahesh,et al.  ARIES: a transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging , 1998 .

[54]  Sashikanth Chandrasekaran,et al.  Shared cache - the future of parallel databases , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[55]  Yennun Huang,et al.  Software rejuvenation: analysis, module and applications , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[56]  Christopher Frost,et al.  Spanner: Google's Globally-Distributed Database , 2012, OSDI.

[57]  Haoxiang Lin,et al.  MODIST: Transparent Model Checking of Unmodified Distributed Systems , 2009, NSDI.