SWORD: scalable workload-aware data placement for transactional workloads

In this paper, we address the problem of transparently scaling out transactional (OLTP) workloads on relational databases, to support database-as-a-service in cloud computing environment. The primary challenges in supporting such workloads include choosing how to partition the data across a large number of machines, minimizing the number of distributed transactions, providing high data availability, and tolerating failures gracefully. Capturing and modeling the transactional workload over a period of time, and then exploiting that information for data placement and replication has been shown to provide significant benefits in performance, both in terms of transaction latencies and overall throughput. However, such workload-aware data placement approaches can incur very high overheads, and further, may perform worse than naive approaches if the workload changes. In this work, we propose SWORD, a <u>s</u>calable <u>wor</u>kload-aware <u>d</u>ata partitioning and placement approach for OLTP workloads, that incorporates a suite of novel techniques to significantly reduce the overheads incurred both during the initial placement, and during query execution at runtime. We model the workload as a hypergraph over the data items, and propose using a hypergraph compression technique to reduce the overheads of partitioning. To deal with workload changes, we propose an incremental data repartitioning technique that modifies data placement in small steps without resorting to complete workload repartitioning. We have built a workload-aware active replication mechanism in SWORD to increase availability and enable load balancing. We propose the use of fine-grained quorums defined at the level of groups of tuples to control the cost of distributed updates, improve throughput, and provide adaptability to different workloads. To our knowledge, SWORD is the first system that uses fine-grained quorums in this context. The results of our experimental evaluation on SWORD deployed on an Amazon EC2 cluster show that our techniques result in orders-of-magnitude reductions in the partitioning and book-keeping overheads, and improve tolerance to failures and workload changes; we also show that choosing quorums based on the query access patterns enables us to better handle query workloads with different read and write access patterns.

[1]  Gustavo Alonso,et al.  Are quorums an alternative for data replication? , 2003, TODS.

[2]  Nicolas Bruno,et al.  Automated partitioning design in parallel database systems , 2011, SIGMOD '11.

[3]  Samir Khuller,et al.  Data Placement and Replica Selection for Improving Co-location in Distributed Environments , 2013, ArXiv.

[4]  Gustavo Alonso,et al.  Database replication , 2010, Proc. VLDB Endow..

[5]  Leslie Lamport,et al.  Consensus on transaction commit , 2004, TODS.

[6]  Daniel J. Abadi,et al.  Low overhead concurrency control for partitioned main memory databases , 2010, SIGMOD Conference.

[7]  Jun Huan,et al.  G-hash: towards fast kernel-based similarity search in large graph databases , 2009, EDBT '09.

[8]  David J. DeWitt,et al.  Parallel database systems: the future of high performance database systems , 1992, CACM.

[9]  Surajit Chaudhuri,et al.  AutoAdmin Project at Microsoft Research: Lessons Learned , 2011, IEEE Data Eng. Bull..

[10]  Carlo Curino,et al.  Schism , 2010, Proc. VLDB Endow..

[11]  Surajit Chaudhuri,et al.  Rethinking Query Processing for Energy Efficiency: Slowing Down to Win the Race. , 2011 .

[12]  Gregory Buehrer,et al.  A scalable pattern mining approach to web graph compression with communities , 2008, WSDM '08.

[13]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[14]  Prashant Malik,et al.  Cassandra: structured storage system on a P2P network , 2009, PODC '09.

[15]  Nisheeth Shrivastava,et al.  Graph summarization with bounded error , 2008, SIGMOD Conference.

[16]  Kumar Chellapilla,et al.  Speeding up algorithms on compressed web graphs , 2009, WSDM '09.

[17]  Michael Stonebraker,et al.  H-store: a high-performance, distributed main memory transaction processing system , 2008, Proc. VLDB Endow..

[18]  Gustavo Alonso,et al.  How to select a replication protocol according to scalability, availability and communication overhead , 2001, Proceedings 20th IEEE Symposium on Reliable Distributed Systems.

[19]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[20]  Berkant Barla Cambazoglu,et al.  Multi-level direct K-way hypergraph partitioning with multiple constraints and fixed vertices , 2008, J. Parallel Distributed Comput..

[21]  Carlo Curino,et al.  Lookup Tables: Fine-Grained Partitioning for Distributed Databases , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[22]  Carlo Curino,et al.  Skew-aware automatic database partitioning in shared-nothing, parallel OLTP systems , 2012, SIGMOD Conference.

[23]  Yawei Li,et al.  Megastore: Providing Scalable, Highly Available Storage for Interactive Services , 2011, CIDR.

[24]  Ricardo Jiménez-Peris,et al.  Database Replication , 2010, Dababase Replication.

[25]  Dennis Shasha,et al.  The dangers of replication and a solution , 1996, SIGMOD '96.