Hash, Displace, and Compress

A hash function h, i.e., a function from the set U of all keys to the range range [m] = {0,...,m − 1} is called a perfect hash function (PHF) for a subset S ⊆ U of size n ≤ m if h is 1-1 on S. The important performance parameters of a PHF are representation size, evaluation time and construction time. In this paper, we present an algorithm that permits to obtain PHFs with expected representation size very close to optimal while retaining O(n) expected construction time and O(1) evaluation time in the worst case. For example in the case m = 1.23n we obtain a PHF that uses space 1.4 bits per key, and for m = 1.01n we obtain space 1.98 bits per key, which was not achievable with previously known methods. Our algorithm is inspired by several known algorithms; the main new feature is that we combine a modification of Pagh’s “hash-and-displace” approach with data compression on a sequence of hash function indices. Our algorithm can also be used for k-perfect hashing, where at most k keys may be mapped to the same value.

[1]  Kathleen Steinhöfel,et al.  Stochastic Algorithms: Foundations and Applications , 2001, Lecture Notes in Computer Science.

[2]  Kimmo Fredriksson,et al.  Simple Compression Code Supporting Random Access and Fast String Matching , 2007, WEA.

[3]  Rodrigo González,et al.  Statistical Encoding of Succinct Data Structures , 2006, CPM.

[4]  Sebastiano Vigna,et al.  Broadword Implementation of Rank/Select Queries , 2008, WEA.

[5]  Martin Dietzfelbinger,et al.  Design Strategies for Minimal Perfect Hash Functions , 2007, SAGA.

[6]  Roberto Grossi,et al.  Squeezing succinct data structures into entropy bounds , 2006, SODA '06.

[7]  Kurt Mehlhorn,et al.  Data Structures and Algorithms 1: Sorting and Searching , 2011, EATCS Monographs on Theoretical Computer Science.

[8]  Robert E. Tarjan,et al.  Storing a sparse table , 1979, CACM.

[9]  Paolo Ferragina,et al.  A simple storage scheme for strings achieving entropy bounds , 2007, SODA '07.

[10]  J. Komlos,et al.  On the Size of Separating Systems and Families of Perfect Hash Functions , 1984 .

[11]  Martin Dietzfelbinger,et al.  Applications of a Splitting Trick , 2009, ICALP.

[12]  Jeanette P. Schmidt,et al.  The Spatial Complexity of Oblivious k-Probe Hash Functions , 2018, SIAM J. Comput..

[13]  Witold Litwin,et al.  Linear Hashing: A new Algorithm for Files and Tables Addressing , 1980, ICOD.

[14]  Rasmus Pagh,et al.  Simple and Space-Efficient Minimal Perfect Hash Functions , 2007, WADS.

[15]  Kai-Min Chung,et al.  Why simple hash functions work: exploiting the entropy in a data stream , 2008, SODA '08.

[16]  George Havas,et al.  A Family of Perfect Hashing Methods , 1996, Comput. J..

[17]  Martin Dietzfelbinger,et al.  Succinct Data Structures for Retrieval and Approximate Membership , 2008, ICALP.

[18]  Nivio Ziviani,et al.  External perfect hashing for very large key sets , 2007, CIKM '07.

[19]  Yi Lu,et al.  Perfect Hashing for Network Applications , 2006, 2006 IEEE International Symposium on Information Theory.

[20]  Peter Sanders,et al.  Semi-external LTL Model Checking , 2008, CAV.

[21]  Martin Dietzfelbinger,et al.  Balanced allocation and dictionaries with tightly packed constant size bins , 2007, Theor. Comput. Sci..

[22]  Rasmus Pagh Hash and Displace: Efficient Evaluation of Minimal Perfect Hash Functions , 1999, WADS.

[23]  Bernard Chazelle,et al.  The Bloomier filter: an efficient data structure for static support lookup tables , 2004, SODA '04.

[24]  Prof. Dr. Kurt Mehlhorn,et al.  Data Structures and Algorithms 1 , 1984, EATCS.

[25]  Nivio Ziviani,et al.  Indexing Internal Memory with Minimal Perfect Hash Functions , 2008, SBBD.

[26]  Úlfar Erlingsson,et al.  A cool and practical alternative to traditional hash tables , 2006 .

[27]  Torben Hagerup,et al.  Efficient Minimal Perfect Hashing in Nearly Minimal Space , 2001, STACS.

[28]  Jaikumar Radhakrishnan Improved Bounds for Covering Complete Uniform Hypergraphs , 1992, Inf. Process. Lett..

[29]  Niklaus Wirth,et al.  Algorithms and Data Structures , 1989, Lecture Notes in Computer Science.