vCorfu: A Cloud-Scale Object Store on a Shared Log

This paper presents vCorfu, a strongly consistent cloudscale object store built over a shared log. vCorfu augments the traditional replication scheme of a shared log to provide fast reads and leverages a new technique, composable state machine replication, to compose large state machines from smaller ones, enabling the use of state machine replication to be used to efficiently in huge data stores. We show that vCorfu outperforms Cassandra, a popular state-of-the art NOSQL stores while providing strong consistency (opacity, read-own-writes), efficient transactions, and global snapshots at cloud scale.

[1]  Marcos K. Aguilera,et al.  Yesquel: scalable sql storage for web applications , 2014, SOSP.

[2]  Josiah L. Carlson,et al.  Redis in Action , 2013 .

[3]  Fred B. Schneider,et al.  Implementing fault-tolerant services using the state machine approach: a tutorial , 1990, CSUR.

[4]  Bruno Ciciani,et al.  A Performance Model of Multi-Version Concurrency Control , 2008, 2008 IEEE International Symposium on Modeling, Analysis and Simulation of Computers and Telecommunication Systems.

[5]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.

[6]  Yawei Li,et al.  Megastore: Providing Scalable, Highly Available Storage for Interactive Services , 2011, CIDR.

[7]  Alfred Z. Spector,et al.  The Camelot project , 1986 .

[8]  Dahlia Malkhi,et al.  CORFU: A distributed shared log , 2013, TOCS.

[9]  Hans-Arno Jacobsen,et al.  PNUTS: Yahoo!'s hosted data serving platform , 2008, Proc. VLDB Endow..

[10]  W. B. Cavnar,et al.  N-gram-based text categorization , 1994 .

[11]  Christopher Frost,et al.  Spanner: Google's Globally-Distributed Database , 2012, OSDI.

[12]  Daniel J. Abadi,et al.  CalvinFS: Consistent WAN Replication and Scalable Metadata Management for Distributed File Systems , 2015, FAST.

[13]  Frank B. Schmuck,et al.  Experience with transactions in QuickSilver , 1991, SOSP '91.

[14]  Karthikeyan Sankaralingam,et al.  Dark Silicon and the End of Multicore Scaling , 2012, IEEE Micro.

[15]  Michael Luby,et al.  How to Construct Pseudo-Random Permutations from Pseudo-Random Functions (Abstract) , 1986, CRYPTO.

[16]  Kristina Chodorow,et al.  MongoDB: The Definitive Guide , 2010 .

[17]  Leslie Lamport,et al.  The part-time parliament , 1998, TOCS.

[18]  Andrea C. Arpaci-Dusseau,et al.  De-indirection for flash-based SSDs with nameless writes , 2012, FAST.

[19]  Marcos K. Aguilera,et al.  Sinfonia: a new paradigm for building scalable distributed systems , 2007, SOSP.

[20]  Josef Bacik,et al.  BTRFS: The Linux B-Tree Filesystem , 2013, TOS.

[21]  Tao Zou,et al.  Tango: distributed data structures over a shared log , 2013, SOSP.

[22]  Tim Kraska,et al.  MDCC: multi-data center consistency , 2012, EuroSys '13.

[23]  Adam Silberstein,et al.  Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.

[24]  David Bermbach,et al.  A Runtime Quality Measurement Framework for Cloud Database Service Systems , 2012, 2012 Eighth International Conference on the Quality of Information and Communications Technology.

[25]  Arkady Kanevsky,et al.  FlexVol: Flexible, Efficient File Volume Virtualization in WAFL , 2008, USENIX Annual Technical Conference.

[26]  Rachid Guerraoui,et al.  On the correctness of transactional memory , 2008, PPoPP.

[27]  David Engel,et al.  The Design And Implementation Of A Log Structured File System , 2016 .

[28]  Yang Zhang,et al.  Extracting More Concurrency from Distributed Transactions , 2014, OSDI.

[29]  Philip A. Bernstein,et al.  Hyder - A Transactional Record Manager for Shared Flash , 2011, CIDR.

[30]  Frank Dabek,et al.  Large-scale Incremental Processing Using Distributed Transactions and Notifications , 2010, OSDI.

[31]  Daniel J. Abadi,et al.  Calvin: fast distributed transactions for partitioned database systems , 2012, SIGMOD Conference.

[32]  Mahadev Konar,et al.  ZooKeeper: Wait-free Coordination for Internet-scale Systems , 2010, USENIX ATC.

[33]  Jean-Philippe Martin,et al.  Spanner's concurrency control , 2013, SIGA.

[34]  Jay Kreps,et al.  Kafka : a Distributed Messaging System for Log Processing , 2011 .

[35]  Mendel Rosenblum,et al.  The design and implementation of a log-structured file system , 1991, SOSP '91.

[36]  Emin Gün Sirer,et al.  HyperDex: a distributed, searchable key-value store , 2012, SIGCOMM '12.

[37]  Ittai Abraham,et al.  Replex: A Scalable, Highly Available Multi-Index Data Store , 2016, USENIX Annual Technical Conference.

[38]  Emin Gün Sirer,et al.  Commodifying Replicated State Machines with OpenReplica , 2012 .