Scalable Self-Tuning Data Placement in Distributed Key-value Stores

This paper addresses the problem of autonomic data placement in replicated key-value stores. The goal is to automatically optimize replica placement in a way that leverages locality patterns in data accesses, such that inter-node communication is minimized. To do this efficiently is extremely challenging, as one needs not only to find lightweight and scalable ways to identify the right data placement, but also to preserve fast data lookup. The paper introduces new techniques that address each of the challenges above. The first challenge is addressed by optimizing, in a decentralized way, the placement of the objects generating most remote operations for each node. The second challenge is addressed by combining the usage of consistent hashing with a novel data structure, which provides efficient probabilistic data placement. These techniques have been integrated in Infinispan, a popular open-source key-value store. The performance results show that the throughput of the optimized system can be 6 times better than a baseline system employing the widely used static placement based on consistent hashing.

[1]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[2]  Lawrence W. Dowdy,et al.  Comparative Models of the File Assignment Problem , 1982, CSUR.

[3]  Brett D. Fleisch,et al.  Mirage: a coherent distributed shared memory design , 1989, SOSP '89.

[4]  Daniel M. Dias,et al.  A modeling study of the TPC-C benchmark , 1993, SIGMOD '93.

[5]  Philip S. Yu,et al.  Replication Algorithms in a Remote Caching Architecture , 1993, IEEE Trans. Parallel Distributed Syst..

[6]  Divyakant Agrawal,et al.  Using broadcast primitives in replicated databases , 1998, Proceedings. 18th International Conference on Distributed Computing Systems (Cat. No.98CB36183).

[7]  Geoff Hulten,et al.  Mining high-speed data streams , 2000, KDD '00.

[8]  P. Krishnan,et al.  The cache location problem , 2000, TNET.

[9]  Gustavo Alonso,et al.  Non-intrusive, parallel recovery of replicated data , 2002, 21st IEEE Symposium on Reliable Distributed Systems, 2002. Proceedings..

[10]  Alan L. Cox,et al.  Conflict-Aware Scheduling for Dynamic Content Applications , 2003, USENIX Symposium on Internet Technologies and Systems.

[11]  Bernard Chazelle,et al.  The Bloomier filter: an efficient data structure for static support lookup tables , 2004, SODA '04.

[12]  Taewook Lee,et al.  Cusum Test for Parameter Change Based on the Maximum Likelihood Estimator , 2004 .

[13]  Divyakant Agrawal,et al.  Efficient Computation of Frequent and Top-k Elements in Data Streams , 2005, ICDT.

[14]  Nikolaos Laoutaris,et al.  Distributed Selfish Replication , 2006, IEEE Transactions on Parallel and Distributed Systems.

[15]  Hans-Arno Jacobsen,et al.  PNUTS: Yahoo!'s hosted data serving platform , 2008, Proc. VLDB Endow..

[16]  Carlo Curino,et al.  Schism , 2010, Proc. VLDB Endow..

[17]  Prashant Malik,et al.  Cassandra: a decentralized structured storage system , 2010, OPSR.

[18]  Adam Silberstein,et al.  Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.

[19]  José Pereira,et al.  A correlation-aware data placement strategy for key-value stores , 2011, DAIS'11.

[20]  Daniel Grosu,et al.  A Distributed Algorithm for the Replica Placement Problem , 2011, IEEE Transactions on Parallel and Distributed Systems.

[21]  Seung-won Hwang,et al.  Scalable Load Balancing in Cluster Storage Systems , 2011, Middleware.

[22]  Luís E. T. Rodrigues,et al.  Exploiting Total Order Multicast in Weakly Consistent Transactional Caches , 2011, 2011 IEEE 17th Pacific Rim International Symposium on Dependable Computing.

[23]  Luís E. T. Rodrigues,et al.  When Scalability Meets Consistency: Genuine Multiversion Update-Serializable Partial Data Replication , 2012, 2012 IEEE 32nd International Conference on Distributed Computing Systems.

[24]  Paolo Romano,et al.  SCORe: A Scalable One-Copy Serializable Partial Replication Protocol , 2012, Middleware.

[25]  Brian F. Cooper Spanner: Google's globally-distributed database , 2013, SYSTOR '13.

[26]  João P. Cachopo,et al.  Data access pattern analysis and prediction for object-oriented applications , 2015 .