Divide and discriminate: algorithm for deterministic and fast hash lookups

Exact and approximate membership lookups are among the most widely used primitives in a number of network applications. Hash tables are commonly used to implement these primitive functions as they provide O(1) operations at moderate load (table occupancy). However, at high load, collisions become prevalent in the table, which makes lookup highly non-deterministic and reduces the average performance. Slow and non-deterministic lookups are detrimental to the performance and scalability of modern platforms such as ASIC/FPGA and multi-core that use highly parallel compute and memory structures. To combat non-determinism and achieve high rate lookups, a recent series of papers employ compact on-chip memory that augments the main hash table and stores certain key information. Unfortunately, they require substantial on-chip memory space and bandwidth, and fail to provide 100% guarantee on lookup rate. In this paper, we solve this with a novel construction that requires 10-fold smaller on-chip memory and guarantees that all lookups require a single hash table access at near full load. The on-chip memory uses only between 1- and 2-bit per item and also needs a small number of accesses (between two and four) per lookup. This represents a substantial improvement over previous schemes and therefore can help realize highly scalable and deterministic lookup tables in modern parallel platforms.

[1]  Berthold Vöcking,et al.  How asymmetry helps load balancing , 1999, JACM.

[2]  Patrick Crowley,et al.  HEXA: Compact Data Structures for Faster Packet Processing , 2007, 2007 IEEE International Conference on Network Protocols.

[3]  Bernhard Plattner,et al.  Scalable high speed IP routing lookups , 1997, SIGCOMM '97.

[4]  Eli Upfal,et al.  Balanced allocations (extended abstract) , 1994, STOC '94.

[5]  Bernard Chazelle,et al.  The Bloomier filter: an efficient data structure for static support lookup tables , 2004, SODA '04.

[6]  Yi Lu,et al.  Perfect Hashing for Network Applications , 2006, 2006 IEEE International Symposium on Information Theory.

[7]  Gilbert Wolrich,et al.  The next generation of Intel IXP network processors , 2002 .

[8]  Pekka Kilpeläinen,et al.  Efficient implementation of Aho–Corasick pattern matching automata using Unicode , 2007, Softw. Pract. Exp..

[9]  Haoyu Song,et al.  Fast hash table lookup using extended bloom filter: an aid to network processing , 2005, SIGCOMM '05.

[10]  Dorothea Heiss-Czedik,et al.  An Introduction to Genetic Algorithms. , 1997, Artificial Life.

[11]  George Varghese,et al.  A pipelined memory architecture for high throughput network processors , 2003, ISCA '03.

[12]  Charles U. Martel,et al.  On efficient unsuccessful search , 1992, SODA '92.

[13]  Eli Upfal,et al.  Balanced Allocations , 1999, SIAM J. Comput..

[14]  Paul G. Spirakis,et al.  Space Efficient Hash Tables with Worst Case Constant Access Time , 2003, Theory of Computing Systems.

[15]  Sean Matthew Dorward,et al.  Awarded Best Paper! - Venti: A New Approach to Archival Data Storage , 2002 .

[16]  Rasmus Pagh,et al.  Simple and Space-Efficient Minimal Perfect Hash Functions , 2007, WADS.

[17]  Gaston H. Gonnet,et al.  Expected Length of the Longest Probe Sequence in Hash Code Searching , 1981, JACM.

[18]  Rasmus Pagh,et al.  Cuckoo Hashing , 2001, Encyclopedia of Algorithms.

[19]  Patrick Crowley,et al.  Peacock Hashing: Deterministic and Updatable Hashing for High Performance Networking , 2008, IEEE INFOCOM 2008 - The 27th Conference on Computer Communications.

[20]  Jonathan S. Turner,et al.  Advanced algorithms for fast and scalable deep packet inspection , 2006, 2006 Symposium on Architecture For Networking And Communications Systems.

[21]  Michael Mitzenmacher,et al.  The Power of Two Choices in Randomized Load Balancing , 2001, IEEE Trans. Parallel Distributed Syst..

[22]  Friedhelm Meyer auf der Heide,et al.  Dynamic perfect hashing: upper and lower bounds , 1988, [Proceedings 1988] 29th Annual Symposium on Foundations of Computer Science.

[23]  Richard M. Karp,et al.  A n^5/2 Algorithm for Maximum Matchings in Bipartite Graphs , 1971, SWAT.

[24]  Rajmohan Rajaraman,et al.  On contention resolution protocols and associated probabilistic phenomena , 1998, JACM.

[25]  George Havas,et al.  An Optimal Algorithm for Generating Minimal Perfect Hash Functions , 1992, Inf. Process. Lett..

[26]  Per-Åke Larson,et al.  Dynamic hash tables , 1988, CACM.

[27]  Micah Adler,et al.  Parallel randomized load balancing , 1995, STOC '95.

[28]  J. Ian Munro,et al.  Membership in Constant Time and Almost-Minimum Space , 1999, SIAM J. Comput..

[29]  KilpeläinenPekka,et al.  Efficient implementation of AhoCorasick pattern matching automata using Unicode , 2007 .

[30]  Sean Quinlan,et al.  Venti: A New Approach to Archival Storage , 2002, FAST.

[31]  Andrei Z. Broder,et al.  Using multiple hash functions to improve IP lookups , 2001, Proceedings IEEE INFOCOM 2001. Conference on Computer Communications. Twentieth Annual Joint Conference of the IEEE Computer and Communications Society (Cat. No.01CH37213).

[32]  Larry Carter,et al.  Universal Classes of Hash Functions , 1979, J. Comput. Syst. Sci..

[33]  Moni Naor,et al.  Anti-persistence: history independent data structures , 2001, STOC '01.

[34]  János Komlós,et al.  Storing a sparse table with O(1) worst case access time , 1982, 23rd Annual Symposium on Foundations of Computer Science (sfcs 1982).