Replication-driven Live Reconfiguration for Fast Distributed Transaction Processing

Recent in-memory database systems leverage advanced hardware features like RDMA to provide transactional processing at millions of transactions per second. Distributed transaction processing systems can scale to even higher rates, especially for partitionable workloads. Unfortunately, these high rates are challenging to sustain during partition reconfiguration events. In this paper, we first show that state-of-the-art approaches would cause notable performance disruption under fast transaction processing. To this end, this paper presents DrTM+B, a live reconfiguration approach that seamlessly repartitions data while causing little performance disruption to running transactions. DrTM+B uses a pre-copy based mechanism, where excessive data transfer is avoided by leveraging properties commonly found in recent transactional systems. DrTM+B's reconfiguration plans reduce data movement by preferring existing data replicas, while data is asynchronously copied from multiple replicas in parallel. It further reuses the log forwarding mechanism in primary-backup replication to seamlessly track and forward dirty database tuples, avoiding iterative copying costs. To commit a reconfiguration plan in a transactionally safe way, DrTM+B designs a cooperative commit protocol to perform data and state synchronizations among replicas. Evaluation on a working system based on DrTM+R with 3-way replication using typical OLTP workloads like TPC-C and SmallBank shows that DrTM+B incurs only very small performance degradation during live reconfiguration. Both the reconfiguration time and the downtime are also minimal.

[1]  Alexander Shraer,et al.  Dynamic Reconfiguration of Primary/Backup Clusters , 2012, USENIX Annual Technical Conference.

[2]  Robbert van Renesse,et al.  Chain Replication for Supporting High Throughput and Availability , 2004, OSDI.

[3]  Prashant J. Shenoy,et al.  "Cut me some slack": latency-aware live migration for databases , 2012, EDBT '12.

[4]  Eddie Kohler,et al.  Speedy transactions in multicore in-memory databases , 2013, SOSP.

[5]  Divyakant Agrawal,et al.  Zephyr: live migration in shared nothing databases for elastic cloud platforms , 2011, SIGMOD '11.

[6]  Jon Howell,et al.  Slicer: Auto-Sharding for Datacenter Applications , 2016, OSDI.

[7]  Eddie Kohler,et al.  Fast Databases with Fast Durability and Recovery Through Multicore Parallelism , 2014, OSDI.

[8]  Ashraf Aboulnaga,et al.  Accordion: Elastic Scalability for Database Systems Supporting Distributed Transactions , 2014, Proc. VLDB Endow..

[9]  Haibo Chen,et al.  Fast and general distributed transactions using RDMA and HTM , 2016, EuroSys.

[10]  Divyakant Agrawal,et al.  Albatross: Lightweight Elasticity in Shared Storage Databases for the Cloud using Live Data Migration , 2011, Proc. VLDB Endow..

[11]  Michael Stonebraker,et al.  Clay: Fine-Grained Adaptive Partitioning for General Database Schemas , 2016, Proc. VLDB Endow..

[12]  Miguel Castro,et al.  No compromises: distributed transactions with consistency, availability, and performance , 2015, SOSP.

[13]  Divyakant Agrawal,et al.  ElasTraS: An elastic, scalable, and self-managing transactional database for the cloud , 2013, TODS.

[14]  Ion Stoica,et al.  BlowFish: Dynamic Storage-Performance Tradeoff in Data Stores , 2016, NSDI.

[15]  Bernhard Mitschang,et al.  ProRea: live database migration for multi-tenant RDBMS with snapshot isolation , 2013, EDBT '13.

[16]  Leslie Lamport,et al.  Vertical paxos and primary-backup replication , 2009, PODC '09.

[17]  Divyakant Agrawal,et al.  Squall: Fine-Grained Live Reconfiguration for Partitioned Main Memory Databases , 2015, SIGMOD Conference.

[18]  Song Jiang,et al.  Workload analysis of a large-scale key-value store , 2012, SIGMETRICS '12.

[19]  Carlo Curino,et al.  Skew-aware automatic database partitioning in shared-nothing, parallel OLTP systems , 2012, SIGMOD Conference.

[20]  David G. Andersen,et al.  FaSST: Fast, Scalable and Simple Distributed Transactions with Two-Sided (RDMA) Datagram RPCs , 2016, OSDI.

[21]  Carlo Curino,et al.  Schism , 2010, Proc. VLDB Endow..

[22]  Mithuna Thottethodi,et al.  Understanding and mitigating the impact of load imbalance in the memory caching tier , 2013, SoCC.

[23]  Michael Stonebraker,et al.  H-store: a high-performance, distributed main memory transaction processing system , 2008, Proc. VLDB Endow..

[24]  Haibo Chen,et al.  Fast In-Memory Transaction Processing Using RDMA and HTM , 2017, ACM Trans. Comput. Syst..

[25]  Andrew Warfield,et al.  Live migration of virtual machines , 2005, NSDI.

[26]  Michael Stonebraker,et al.  E-Store: Fine-Grained Elastic Partitioning for Distributed Transaction Processing , 2014, Proc. VLDB Endow..

[27]  Divyakant Agrawal,et al.  Live Database Migration for Elasticity in a Multitenant Database for Cloud Platforms , 2010 .