Low-Overhead Paxos Replication

Log replication is a key component in highly available database systems. In order to guarantee data consistency and reliability, it is common for modern database systems to utilize Paxos protocol, which is responsible for replicating transactional logs from one primary node to multiple backups. However, the Paxos replication needs to store and synchronize some additional metadata, such as committed log sequence number (commit point), to guarantee the consistency of the database. This increases the overhead of storage and network, which would have a negative impact on the throughput in the update intensive work load. In this paper, we present an implementation of log replication and database recovery methods, which adopts the idea of piggybacking, i.e., commit point can be embedded in the commit logs. This practice not only retains virtues of Paxos replication, but also reduces disk and network IO effectively. We implemented and evaluated our approach in a main memory database system. Our experiments show that the piggybacking method can offer 1.3× higher throughput than typical log replication with synchronization mechanism.

[1]  Leslie Lamport,et al.  Fast Paxos , 2006, Distributed Computing.

[2]  John K. Ousterhout,et al.  In Search of an Understandable Consensus Algorithm , 2014, USENIX ATC.

[3]  Leslie Lamport,et al.  The part-time parliament , 1998, TOCS.

[4]  Philip A. Bernstein,et al.  Hyder - A Transactional Record Manager for Shared Flash , 2011, CIDR.

[5]  Daniel J. Abadi,et al.  Calvin: fast distributed transactions for partitioned database systems , 2012, SIGMOD Conference.

[6]  Brett D. Fleisch,et al.  The Chubby lock service for loosely-coupled distributed systems , 2006, OSDI '06.

[7]  Leslie Lamport,et al.  Paxos Made Simple , 2001 .

[8]  Dennis Shasha,et al.  The dangers of replication and a solution , 1996, SIGMOD '96.

[9]  Jun Rao,et al.  Using Paxos to Build a Scalable, Consistent, and Highly Available Datastore , 2011, Proc. VLDB Endow..

[10]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.

[11]  Rick Cattell,et al.  Scalable SQL and NoSQL data stores , 2011, SGMD.

[12]  Christopher Frost,et al.  Spanner: Google's Globally-Distributed Database , 2012, OSDI.

[13]  Ian Rae,et al.  F1: A Distributed SQL Database That Scales , 2013, Proc. VLDB Endow..

[14]  Parag Agrawal,et al.  The case for RAMClouds: scalable high-performance storage entirely in DRAM , 2010, OPSR.

[15]  Prashant Malik,et al.  Cassandra: a decentralized structured storage system , 2010, OPSR.

[16]  Hamid Pirahesh,et al.  ARIES: a transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging , 1998 .

[17]  Yawei Li,et al.  Megastore: Providing Scalable, Highly Available Storage for Interactive Services , 2011, CIDR.

[18]  Dahlia Malkhi,et al.  CORFU: A distributed shared log , 2013, TOCS.

[19]  Adam Silberstein,et al.  Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.

[20]  Divyakant Agrawal,et al.  Serializability, not Serial: Concurrency Control and Availability in Multi-Datacenter Datastores , 2012, Proc. VLDB Endow..

[21]  Hans-Arno Jacobsen,et al.  PNUTS: Yahoo!'s hosted data serving platform , 2008, Proc. VLDB Endow..

[22]  Miguel Castro,et al.  No compromises: distributed transactions with consistency, availability, and performance , 2015, SOSP.

[23]  Tao Zou,et al.  Tango: distributed data structures over a shared log , 2013, SOSP.