Replex: A Scalable, Highly Available Multi-Index Data Store

The need for scalable, high-performance datastores has led to the development of NoSQL databases, which achieve scalability by partitioning data over a single key. However, programmers often need to query data with other keys, which data stores provide by either querying every partition, eliminating the benefits of partitioning, or replicating additional indexes, wasting the benefits of data replication. In this paper, we show there is no need to compromise scalability for functionality. We present Replex, a datastore that enables efficient querying on multiple keys by rethinking data placement during replication. Traditionally, a data store is first globally partitioned, then each partition is replicated identically to multiple nodes. Instead, Replex relies on a novel replication unit, termed replex, which partitions a full copy of the data based on its unique key. Replexes eliminate any additional overhead to maintaining indices, at the cost of increasing recovery complexity. To address this issue, we also introduce hybrid replexes, which enable a rich design space for trading off steady-state performance with faster recovery. We build, parameterize, and evaluate Replex on multiple dimensions and find that Replex surpasses the steady-state and failure recovery performance of Hyper-Dex, a state-of-the-art multi-key data store.

[1]  Kristina Chodorow,et al.  MongoDB: The Definitive Guide , 2010 .

[2]  Cheng Huang,et al.  Erasure Coding in Windows Azure Storage , 2012, USENIX Annual Technical Conference.

[3]  Adam Silberstein,et al.  Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.

[4]  Christopher Frost,et al.  Spanner: Google's Globally-Distributed Database , 2012, OSDI.

[5]  Daniel J. Abadi,et al.  CalvinFS: Consistent WAN Replication and Scalable Metadata Management for Distributed File Systems , 2015, FAST.

[6]  Ronaldo dos Santos Mello,et al.  SimpleSQL: A Relational Layer for SimpleDB , 2012, ADBIS.

[7]  Cristian Bucur,et al.  A comparison between several NoSQL databases with comments and notes , 2011, 2011 RoEduNet International Conference 10th Edition: Networking in Education and Research.

[8]  Dahlia Malkhi,et al.  CORFU: A distributed shared log , 2013, TOCS.

[9]  Michael J. Freedman,et al.  Object Storage on CRAQ: High-Throughput Chain Replication for Read-Mostly Workloads , 2009, USENIX Annual Technical Conference.

[10]  P. Maymounkov Online codes , 2002 .

[11]  Lie-Liang Yang,et al.  Systematic Luby Transform Codes and Their Soft Decoding , 2007, 2007 IEEE Workshop on Signal Processing Systems.

[12]  J. Chris Anderson,et al.  CouchDB: The Definitive Guide , 2010 .

[13]  V. Ganesh,et al.  HBase and Hypertable for large scale distributed storage systems A Performance evaluation for Open Source BigTable Implementations , 2008 .

[14]  Garth A. Gibson,et al.  RAID: high-performance, reliable secondary storage , 1994, CSUR.

[15]  Michael Luby Tornado Codes: Practical Erasure Codes Based on Random Irregular Graphs , 1998, RANDOM.

[16]  J. Thorpe Low-Density Parity-Check (LDPC) Codes Constructed from Protographs , 2003 .

[17]  Tao Zou,et al.  Tango: distributed data structures over a shared log , 2013, SOSP.

[18]  Rusty Klophaus,et al.  Riak Core: building distributed applications without shared state , 2010, CUFP '10.

[19]  Philip A. Bernstein,et al.  Hyder - A Transactional Record Manager for Shared Flash , 2011, CIDR.

[20]  Leslie Lamport,et al.  Vertical paxos and primary-backup replication , 2009, PODC '09.

[21]  I. Reed,et al.  Polynomial Codes Over Certain Finite Fields , 1960 .

[22]  Marcos K. Aguilera,et al.  Yesquel: scalable sql storage for web applications , 2014, SOSP.

[23]  Emin Gün Sirer,et al.  HyperDex: a distributed, searchable key-value store , 2012, SIGCOMM '12.

[24]  J. Chris Anderson,et al.  CouchDB - The Definitive Guide: Time to Relax , 2010 .

[25]  F. Moore,et al.  Polynomial Codes Over Certain Finite Fields , 2017 .

[26]  Robbert van Renesse,et al.  Chain Replication for Supporting High Throughput and Availability , 2004, OSDI.