Design of a near-minimal dynamic perfect hash function on embedded device

There has been a general opinion that it is difficult to construct perfect hash tables with high load factor for large datasets having a million records. The problem is even more challenging if new records can be added to the hash table incrementally. In this article, we shall demonstrate the design of a dynamic perfect hash function on embedded device based on simple bit-shuffle and bit-extraction operations. The achievable load factor can be up to 100%, and the amortized memory cost of the hash function is about 7 to 15 bits per key for 32-bit keys. Incremental updates to the hash table are allowed. The perfect hash function for a dataset with 1 million keys can be constructed in a few seconds of CPU time.

[1]  Haoyu Song,et al.  Fast hash table lookup using extended bloom filter: an aid to network processing , 2005, SIGCOMM '05.

[2]  Xing Wang,et al.  String Searching Engine for Virus Scanning , 2011, IEEE Transactions on Computers.

[3]  Yi Lu,et al.  Perfect Hashing for Network Applications , 2006, 2006 IEEE International Symposium on Information Theory.

[4]  Stefano Giordano,et al.  Divide and discriminate: algorithm for deterministic and fast hash lookups , 2009, ANCS '09.

[5]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[6]  Xing Wang,et al.  Multi-Stride String Searching for High-Speed Content Inspection , 2012, Comput. J..

[7]  Rasmus Pagh,et al.  Cuckoo Hashing , 2001, Encyclopedia of Algorithms.

[8]  John W. Lockwood,et al.  Deep packet inspection using parallel bloom filters , 2004, IEEE Micro.

[9]  Charlie Johnson,et al.  IBM Power Edge of Network Processor: A Wire-Speed System on a Chip , 2011, IEEE Micro.

[10]  Srihari Cadambi,et al.  Chisel: A Storage-efficient, Collision-free Hash-based Network Processing Architecture , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[11]  Ted G. Lewis,et al.  Hashing for dynamic and static internal tables , 1988, Computer.

[12]  Derek Chi-Wai Pao,et al.  Bit-Shuffled Trie: IP Lookup with Multi-Level Index Tables , 2011, 2011 IEEE International Conference on Communications (ICC).

[13]  Edward A. Fox,et al.  Practical minimal perfect hash functions for large databases , 1992, CACM.

[14]  János Komlós,et al.  Storing a sparse table with O(1) worst case access time , 1982, 23rd Annual Symposium on Foundations of Computer Science (sfcs 1982).