Schism

We present Schism, a novel workload-aware approach for database partitioning and replication designed to improve scalability of shared-nothing distributed databases. Because distributed transactions are expensive in OLTP settings (a fact we demonstrate through a series of experiments), our partitioner attempts to minimize the number of distributed transactions, while producing balanced partitions. Schism consists of two phases: i) a workload-driven, graph-based replication/partitioning phase and ii) an explanation and validation phase. The first phase creates a graph with a node per tuple (or group of tuples) and edges between nodes accessed by the same transaction, and then uses a graph partitioner to split the graph into k balanced partitions that minimize the number of cross-partition transactions. The second phase exploits machine learning techniques to find a predicate-based explanation of the partitioning strategy (i.e., a set of range predicates that represent the same replication/partitioning scheme produced by the partitioner). The strengths of Schism are: i) independence from the schema layout, ii) effectiveness on n-to-n relations, typical in social network databases, iii) a unified and fine-grained approach to replication and partitioning. We implemented and tested a prototype of Schism on a wide spectrum of test cases, ranging from classical OLTP workloads (e.g., TPC-C and TPC-E), to more complex scenarios derived from social network websites (e.g., Epinions.com), whose schema contains multiple n-to-n relationships, which are known to be hard to partition. Schism consistently outperforms simple partitioning schemes, and in some cases proves superior to the best known manual partitioning, reducing the cost of distributed transactions up to 30%.

[1]  Brian W. Kernighan,et al.  An efficient heuristic procedure for partitioning graphs , 1970, Bell Syst. Tech. J..

[2]  David J. DeWitt,et al.  Hybrid-Range Partitioning Strategy: A New Declustering Strategy for Multiprocessor Database Machines , 1990, VLDB.

[3]  Jeffrey F. Naughton,et al.  A stochastic approach for clustering in object bases , 1991, SIGMOD '91.

[4]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[5]  David J. DeWitt,et al.  Parallel database systems: the future of high performance database systems , 1992, CACM.

[6]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[7]  Shashi Shekhar,et al.  Partitioning Similarity Graphs: A Framework for Declustering Problems , 1996, Inf. Syst..

[8]  Daniel C. Zilio,et al.  Physical database design decision algorithms and concurrent reorganization for parallel database systems , 1998 .

[9]  Vipin Kumar,et al.  A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..

[10]  Chun Zhang,et al.  Automating physical database design in a parallel database , 2002, SIGMOD '02.

[11]  Robert Freeman Oracle Database 11g New Features , 2002 .

[12]  G. Karypis,et al.  Multi-objective hypergraph partitioning algorithms for cut and maximum subdomain degree minimization , 2003, ICCAD-2003. International Conference on Computer Aided Design (IEEE Cat. No.03CH37486).

[13]  Vivek R. Narasayya,et al.  Integrating vertical and horizontal partitioning into automated physical database design , 2004, SIGMOD '04.

[14]  Paolo Avesani,et al.  Controversial Users Demand Local Trust Metrics: An Experimental Study on Epinions.com Community , 2005, AAAI.

[15]  Cevdet Aykanat,et al.  Iterative-improvement-based declustering heuristics for multi-disk databases , 2005, Inf. Syst..

[16]  Satish Rao,et al.  Graph partitioning using single commodity flows , 2006, STOC '06.

[17]  Michael Stonebraker,et al.  The End of an Architectural Era (It's Time for a Complete Rewrite) , 2007, VLDB.

[18]  Hans-Arno Jacobsen,et al.  PNUTS: Yahoo!'s hosted data serving platform , 2008, Proc. VLDB Endow..

[19]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[20]  J. M. Pujol,et al.  Scaling Online Social Networks without Pains , 2009 .

[21]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[22]  Carlo Curino,et al.  Relational Cloud: The Case for a Database Service , 2010 .

[23]  Adam Silberstein,et al.  Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.