Backyard Cuckoo Hashing: Constant Worst-Case Operations with a Succinct Representation

The performance of a dynamic dictionary is measured mainly by its update time, lookup time, and space consumption. In terms of update time and lookup time there are known constructions that guarantee constant-time operations in the worst case with high probability, and in terms of space consumption there are known constructions that use essentially optimal space. However, although the first analysis of a dynamic dictionary dates back more than 45 years ago (when Knuth analyzed linear probing in 1963), the trade-off between these aspects of performance is still not completely understood. In this paper we settle two fundamental open problems: \begin{itemize} \item We construct the first dynamic dictionary that enjoys the best of both worlds: it stores $\boldsymbol{n}$ elements using $\boldsymbol{(1 + \epsilon) n}$ memory words, and guarantees constant-time operations in the worst case with high probability. Specifically, for any \boldsymbol{\epsilon = \Omega ( (\log \log n / \log n)^{1/2} )}$ and for any sequence of polynomially many operations, with high probability over the randomness of the initialization phase, all operations are performed in constant time which is independent of $\boldsymbol{\epsilon}$. The construction is a two-level variant of cuckoo hashing, augmented with a ``backyard'' that handles a large fraction of the elements, together with a de-amortized perfect hashing scheme for eliminating the dependency on $\boldsymbol{\epsilon}$. \item We present a variant of the above construction that uses only $\boldsymbol{(1 + o(1))\B}$ bits, where $\boldsymbol{\B}$ is the information-theoretic lower bound for representing a set of size $\boldsymbol{n}$ taken from a universe of size $\boldsymbol{u}$, and guarantees constant-time operations in the worst case with high probability, as before. This problem was open even in the {\em amortized} setting. One of the main ingredients of our construction is a permutation-based variant of cuckoo hashing, which significantly improves the space consumption of cuckoo hashing when dealing with a rather small universe. \end{itemize}

[1]  Aravind Srinivasan,et al.  Chernoff-Hoeffding bounds for applications with limited independence , 1995, SODA '93.

[2]  Michael Mitzenmacher,et al.  More Robust Hashing: Cuckoo Hashing with a Stash , 2008, ESA.

[3]  Michael Luby,et al.  How to Construct Pseudo-Random Permutations from Pseudo-Random Functions (Abstract) , 1986, CRYPTO.

[4]  Paul C. Kocher,et al.  Timing Attacks on Implementations of Diffie-Hellman, RSA, DSS, and Other Systems , 1996, CRYPTO.

[5]  Alan Siegel,et al.  On Universal Classes of Extremely Random Constant-Time Hash Functions , 1995, SIAM J. Comput..

[6]  Kurt Mehlhorn,et al.  On the complexity of a game related to the dictionary problem , 1989, 30th Annual Symposium on Foundations of Computer Science.

[7]  Rasmus Pagh Low Redundancy in Static Dictionaries with Constant Query Time , 2001, SIAM J. Comput..

[8]  Andrea Montanari,et al.  Tight Thresholds for Cuckoo Hashing via XORSAT , 2009, ICALP.

[9]  Alan M. Frieze,et al.  Maximum matchings in random bipartite graphs and the space utilization of Cuckoo Hash tables , 2009, Random Struct. Algorithms.

[10]  Donald E. Knuth,et al.  Sorting and Searching , 1973 .

[11]  Michael Mitzenmacher,et al.  Compressed bloom filters , 2002, TNET.

[12]  Rajeev Raman,et al.  Succinct Dynamic Dictionaries and Trees , 2003, ICALP.

[13]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[14]  Friedhelm Meyer auf der Heide,et al.  A New Universal Class of Hash Functions and Dynamic Hashing in Real Time , 1990, ICALP.

[15]  Jeffrey F. Naughton,et al.  Clocked adversaries for hashing , 1993, Algorithmica.

[16]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[17]  Moni Naor,et al.  Derandomized Constructions of k-Wise (Almost) Independent Permutations , 2005, Algorithmica.

[18]  Alessandro Panconesi,et al.  Concentration of Measure for the Analysis of Randomized Algorithms , 2009 .

[19]  Rasmus Pagh,et al.  Hashing, randomness and dictionaries , 2010 .

[20]  Rasmus Pagh,et al.  Cuckoo Hashing , 2001, Encyclopedia of Algorithms.

[21]  Paul G. Spirakis,et al.  Space Efficient Hash Tables with Worst Case Constant Access Time , 2003, Theory of Computing Systems.

[22]  Michael Mitzenmacher,et al.  Using a Queue to De-amortize Cuckoo Hashing in Hardware , 2007 .

[23]  Martin Dietzfelbinger,et al.  Succinct Data Structures for Retrieval and Approximate Membership , 2008, ICALP.

[24]  Andrew Chi-Chih Yao,et al.  Should Tables Be Sorted? , 1981, JACM.

[25]  Pierre Wolper,et al.  Memory-efficient algorithms for the verification of temporal properties , 1990, Formal Methods Syst. Des..

[26]  Rajamani Sundar A lower bound for the dictionary problem under a hashing model , 1991, [1991] Proceedings 32nd Annual Symposium of Foundations of Computer Science.

[27]  Andrei Z. Broder,et al.  Using multiple hash functions to improve IP lookups , 2001, Proceedings IEEE INFOCOM 2001. Conference on Computer Communications. Twentieth Annual Joint Conference of the IEEE Computer and Communications Society (Cat. No.01CH37213).

[28]  Vasek Chvátal,et al.  The tail of the hypergeometric distribution , 1979, Discret. Math..

[29]  J. Ian Munro,et al.  Membership in Constant Time and Almost-Minimum Space , 1999, SIAM J. Comput..

[30]  Vijaya Ramachandran,et al.  The k-orientability thresholds for Gn, p , 2007, SODA '07.

[31]  Martin Dietzfelbinger,et al.  Applications of a Splitting Trick , 2009, ICALP.

[32]  Luc Devroye,et al.  Two-Way Chaining with Reassignment , 2005, SIAM J. Comput..

[33]  Martin Dietzfelbinger,et al.  Almost random graphs with simple hash functions , 2003, STOC '03.

[34]  Rina Panigrahy,et al.  Efficient hashing with lookups in two memory accesses , 2004, SODA '05.

[35]  Larry Carter,et al.  Exact and approximate membership testers , 1978, STOC.

[36]  Peter Sanders,et al.  The random graph threshold for k-orientiability and a fast algorithm for optimal multiple-choice allocation , 2007, SODA '07.

[37]  M. V. Wilkes,et al.  The Art of Computer Programming, Volume 3, Sorting and Searching , 1974 .

[38]  Bernard Chazelle,et al.  The Bloomier filter: an efficient data structure for static support lookup tables , 2004, SODA '04.

[39]  C. SIAMJ. LOW REDUNDANCY IN STATIC DICTIONARIES WITH CONSTANT QUERY TIME , 2001 .

[40]  Luc Devroye,et al.  On the k-orientability of random graphs , 2009, Discret. Math..

[41]  Kai Li,et al.  Avoiding the Disk Bottleneck in the Data Domain Deduplication File System , 2008, FAST.

[42]  Philipp Woelfel,et al.  Maintaining External Memory Efficient Hash Tables , 2006, APPROX-RANDOM.

[43]  Alan M. Frieze,et al.  An Analysis of Random-Walk Cuckoo Hashing , 2009, APPROX-RANDOM.

[44]  Torben Hagerup,et al.  Sorting and Searching on the Word RAM , 1998, STACS.

[45]  Friedhelm Meyer auf der Heide,et al.  De Dictionariis Dynamicis Pauco Spatio Utentibus (lat. On Dynamic Dictionaries Using Little Space) , 2006, LATIN.

[46]  Andrei Broder,et al.  Network Applications of Bloom Filters: A Survey , 2004, Internet Math..

[47]  Rasmus Pagh Hash and Displace: Efficient Evaluation of Minimal Perfect Hash Functions , 1999, WADS.

[48]  Pierre Wolper,et al.  Memory-efficient algorithms for the verification of temporal properties , 1990, Formal Methods Syst. Des..

[49]  Adi Shamir,et al.  Cache Attacks and Countermeasures: The Case of AES , 2006, CT-RSA.

[50]  Rina Panigrahy,et al.  3.5-Way Cuckoo Hashing for the Price of 2-and-a-Bit , 2009, ESA.

[51]  Moni Naor,et al.  De-amortized Cuckoo Hashing: Provable Worst-Case Performance and Experimental Results , 2009, ICALP.

[52]  Anna Pagh,et al.  Uniform Hashing in Constant Time and Optimal Space , 2008, SIAM J. Comput..

[53]  Martin Dietzfelbinger,et al.  Balanced allocation and dictionaries with tightly packed constant size bins , 2005, Theor. Comput. Sci..

[54]  Peter Bro Miltersen Cell probe complexity-a survey , 1999 .

[55]  Adi Shamir,et al.  Efficient Cache Attacks on AES, and Countermeasures , 2010, Journal of Cryptology.

[56]  S. Srinivasa Rao,et al.  An optimal Bloom filter replacement , 2005, SODA '05.

[57]  Friedhelm Meyer auf der Heide,et al.  Dynamic Perfect Hashing: Upper and Lower Bounds , 1994, SIAM J. Comput..

[58]  Ely Porat,et al.  An Optimal Bloom Filter Replacement Based on Matrix Solving , 2008, CSR.

[59]  Donald E. Knuth,et al.  The Art of Computer Programming: Volume 3: Sorting and Searching , 1998 .

[60]  Shachar Lovett,et al.  A Lower Bound for Dynamic Approximate Membership Data Structures , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[61]  Konstantinos Panagiotou,et al.  Sharp load thresholds for cuckoo hashing , 2009, Random Struct. Algorithms.

[62]  János Komlós,et al.  Storing a sparse table with O(1) worst case access time , 1982, 23rd Annual Symposium on Foundations of Computer Science (sfcs 1982).

[63]  Friedhelm Meyer auf der Heide,et al.  Dynamic perfect hashing: upper and lower bounds , 1988, [Proceedings 1988] 29th Annual Symposium on Foundations of Computer Science.