Fast Quorum-Based Log Replication and Replay for Fast Databases

The modern In-Memory Database (IMDB) can support highly concurrent OLTP workloads and generate massive transactional logs per second. Quorum based replication protocols such as Paxos or Raft have been widely used in distributed databases. However, it’s non-trivial to replicate IMDB because high transaction rate has brought new challenges. First, the leader node in quorum replication should have adaptivity by considering various transaction arrival rates and the processing capability of follower nodes. Second, followers are required to replay logs to catch up the state of the leader in the highly concurrent setting to reduce visibility gap. To this end, we built QuorumX, an efficient quorum-based replication framework for IMDB under heavy OLTP workloads. QuorumX combines critical path based batching and pipeline batching to provide an adaptive log propagation scheme to obtain a stable and high performance at various settings. Further, we propose a safe and coordination-free log replay scheme to minimize the visibility gap between the leader and follower IMDBs. Our evaluation results with the YCSB and TPC-C benchmarks demonstrate that QuorumX achieves the performance close to asynchronous primary-backup replication without sacrificing the data consistency and availability.

[1]  Angela Demke Brown,et al.  Scalable Replay-Based Replication For Fast Databases , 2017, Proc. VLDB Endow..

[2]  Yuanyuan Tian,et al.  Hybrid Transactional/Analytical Processing: A Survey , 2017, SIGMOD Conference.

[3]  Dong Zhou,et al.  KuaFu: Closing the parallelism gap in database replication , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[4]  Qian Lin,et al.  PaxosStore: High-availability Storage Made Practical in WeChat , 2017, Proc. VLDB Endow..

[5]  Michael Stonebraker,et al.  Concurrency Control and Consistency of Multiple Copies of Data in Distributed Ingres , 1979, IEEE Transactions on Software Engineering.

[6]  Gustavo Alonso,et al.  Database replication techniques: a three parameter classification , 2000, Proceedings 19th IEEE Symposium on Reliable Distributed Systems SRDS-2000.

[7]  Wook-Shin Han,et al.  Parallel Replication across Formats in SAP HANA for Scaling Out Mixed OLTP/OLAP Workloads , 2017, Proc. VLDB Endow..

[8]  Eddie Kohler,et al.  Fast Databases with Fast Durability and Recovery Through Multicore Parallelism , 2014, OSDI.

[9]  André Schiper,et al.  JPaxos: State machine replication based on the Paxos protocol , 2011 .

[10]  Christopher Frost,et al.  Spanner: Google's Globally-Distributed Database , 2012, OSDI.

[11]  Mahadev Konar,et al.  ZooKeeper: Wait-free Coordination for Internet-scale Systems , 2010, USENIX ATC.

[12]  Yanhong A. Liu,et al.  Moderately Complex Paxos Made Simple: High-Level Specification of Distributed Algorithm , 2017, ArXiv.

[13]  Robert Griesemer,et al.  Paxos made live: an engineering perspective , 2007, PODC '07.

[14]  Roy Friedman,et al.  Adaptive Batching for Replicated Servers , 2006, 2006 25th IEEE Symposium on Reliable Distributed Systems (SRDS'06).

[15]  Matteo Leonetti,et al.  Self-tuning batching in total order broadcast protocols via analytical modelling and reinforcement learning , 2012, 2012 International Conference on Computing, Networking and Communications (ICNC).

[16]  Mao Yang,et al.  PacificA: Replication in Log-Based Distributed Storage Systems , 2008 .

[17]  Jun Rao,et al.  Using Paxos to Build a Scalable, Consistent, and Highly Available Datastore , 2011, Proc. VLDB Endow..

[18]  Gustavo Alonso,et al.  Don't Be Lazy, Be Consistent: Postgres-R, A New Way to Implement Database Replication , 2000, VLDB.

[19]  André Schiper,et al.  Tuning Paxos for High-Throughput with Batching and Pipelining , 2012, ICDCN.

[20]  John K. Ousterhout,et al.  In Search of an Understandable Consensus Algorithm , 2014, USENIX ATC.

[21]  Adam Silberstein,et al.  Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.

[22]  Feifei Li,et al.  Solar: Towards a Shared-Everything Database on Distributed Log-Structured Storage , 2018, USENIX Annual Technical Conference.