HyperDex: a distributed, searchable key-value store

Distributed key-value stores are now a standard component of high-performance web services and cloud computing applications. While key-value stores offer significant performance and scalability advantages compared to traditional databases, they achieve these properties through a restricted API that limits object retrieval---an object can only be retrieved by the (primary and only) key under which it was inserted. This paper presents HyperDex, a novel distributed key-value store that provides a unique search primitive that enables queries on secondary attributes. The key insight behind HyperDex is the concept of hyperspace hashing in which objects with multiple attributes are mapped into a multidimensional hyperspace. This mapping leads to efficient implementations not only for retrieval by primary key, but also for partially-specified secondary attribute searches and range queries. A novel chaining protocol enables the system to achieve strong consistency, maintain availability and guarantee fault tolerance. An evaluation of the full system shows that HyperDex is 12-13x faster than Cassandra and MongoDB for finding partially specified objects. Additionally, HyperDex achieves 2-4x higher throughput for get/put operations.

[1]  Allen Klinger,et al.  PATTERNS AND SEARCH STATISTICS , 1971 .

[2]  Jon Louis Bentley,et al.  Multidimensional binary search trees used for associative searching , 1975, CACM.

[3]  Jürg Nievergelt,et al.  The Grid File: An Adaptable, Symmetric Multikey File Structure , 1984, TODS.

[4]  T. H. Merrett,et al.  A class of data structures for associative searching , 1984, PODS.

[5]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[6]  Maurice Herlihy,et al.  Linearizability: a correctness condition for concurrent objects , 1990, TOPL.

[7]  Hanan Samet,et al.  Spatial Data Structures , 1995, Modern Database Systems.

[8]  Marvin Theimer,et al.  Managing update conflicts in Bayou, a weakly connected replicated storage system , 1995, SOSP.

[9]  David R. Karger,et al.  Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web , 1997, STOC '97.

[10]  Rudolf Bayer,et al.  The Universal B-Tree for Multidimensional Indexing: general Concepts , 1997, WWCA.

[11]  Eric Brewer,et al.  A design framework and a scalable storage platform to simplify internet service construction , 2000 .

[12]  Mark Handley,et al.  A scalable content-addressable network , 2001, SIGCOMM '01.

[13]  Antony I. T. Rowstron,et al.  Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems , 2001, Middleware.

[14]  David R. Karger,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM '01.

[15]  Nancy A. Lynch,et al.  Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services , 2002, SIGA.

[16]  Randolph Y. Wang,et al.  SkipIndex : Towards a Scalable Peer-to-Peer Index Service for High Dimensional Data , 2004 .

[17]  Srinivasan Seshan,et al.  Mercury: supporting scalable multi-attribute range queries , 2004, SIGCOMM '04.

[18]  Hector Garcia-Molina,et al.  One torus to rule them all: multi-dimensional queries in P2P systems , 2004, WebDB '04.

[19]  Robbert van Renesse,et al.  Chain Replication for Supporting High Throughput and Availability , 2004, OSDI.

[20]  Manish Parashar,et al.  Enabling flexible queries with guarantees in P2P systems , 2004, IEEE Internet Computing.

[21]  Farnoush Banaei Kashani,et al.  SWAM: a family of access methods for similarity-search in peer-to-peer data networks , 2004, CIKM '04.

[22]  Min Cai,et al.  MAAN: A Multi-Attribute Addressable Network for Grid Information Services , 2003, Journal of Grid Computing.

[23]  Johannes Gehrke,et al.  Querying peer-to-peer networks using P-trees , 2004, WebDB '04.

[24]  Beng Chin Ooi,et al.  Supporting multi-dimensional range queries in peer-to-peer systems , 2005, Fifth IEEE International Conference on Peer-to-Peer Computing (P2P'05).

[25]  David R. Karger,et al.  Arpeggio: Metadata Searching and Content Sharing with Chord , 2005, IPTPS.

[26]  Brett D. Fleisch,et al.  The Chubby lock service for loosely-coupled distributed systems , 2006, OSDI '06.

[27]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.

[28]  Dan Pritchett,et al.  BASE: An Acid Alternative , 2008, ACM Queue.

[29]  Hans-Arno Jacobsen,et al.  PNUTS: Yahoo!'s hosted data serving platform , 2008, Proc. VLDB Endow..

[30]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[31]  Marcos K. Aguilera,et al.  A practical scalable distributed B-tree , 2008, Proc. VLDB Endow..

[32]  Amar Phanishayee,et al.  FAWN: a fast array of wimpy nodes , 2009, SOSP '09.

[33]  Jeffrey Dean,et al.  Designs, Lessons and Advice from Building Large Distributed Systems , 2009 .

[34]  Prashant Malik,et al.  Cassandra: a decentralized structured storage system , 2010, OPSR.

[35]  Samuel Madden,et al.  Transactional Consistency and Automatic Management in an Application Data Cache , 2010, OSDI.

[36]  Adam Silberstein,et al.  Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.

[37]  Mahadev Konar,et al.  ZooKeeper: Wait-free Coordination for Internet-scale Systems , 2010, USENIX Annual Technical Conference.

[38]  Erez Zadok,et al.  An efficient multi-tier tablet server storage architecture , 2011, SoCC.

[39]  Bin Fan,et al.  SILT: a memory-efficient, high-performance key-value store , 2011, SOSP.

[40]  VICCI: A Programmable Cloud-Computing Research Testbed , 2011 .

[41]  Michael J. Freedman,et al.  Don't settle for eventual: scalable causal consistency for wide-area storage with COPS , 2011, SOSP.

[42]  P. Schrimpf,et al.  Dynamic Programming , 2011 .

[43]  Jun Rao,et al.  Using Paxos to Build a Scalable, Consistent, and Highly Available Datastore , 2011, Proc. VLDB Endow..

[44]  Mendel Rosenblum,et al.  Fast crash recovery in RAMCloud , 2011, SOSP.

[45]  Kimberly Keeton,et al.  LazyBase: trading freshness for performance in a scalable database , 2012, EuroSys '12.

[46]  Emin Gün Sirer,et al.  Commodifying Replicated State Machines with OpenReplica , 2012 .

[47]  Eddie Kohler,et al.  Cache craftiness for fast multicore key-value storage , 2012, EuroSys '12.

[48]  Modern Database,et al.  Modern database systems : the object model, interoperability, and beyond , 2014 .