Segmented hash

Hash tables provide efficient table implementations, achieving O(1), query, insert and delete operations at low loads. However, at moderate or high loads collisions are quite frequent, resulting in decreased performance. In this paper, we propose the segmented hash table architecture, which ensures constant time hash operations at high loads with high probability. To achieve this, the hash memory is divided into N logical segments so that each incoming key has N potential storage locations; the destination segment is chosen so as to minimize collisions. In this way, collisions, and the associated probe sequences, are dramatically reduced. In order to keep memory utilization minimized, probabilistic filters are kept on-chip to allow the N segments to be accessed without increasing the number of off-chip memory operations. These filters are kept small and accurate with the help of a novel algorithm, called selective filter insertion, which keeps the segments balanced while minimizing false positive rates (i.e., incorrect filter predictions). The performance of our scheme is quantified via analytical modeling and software simulations. Moreover, we discuss efficient implementations that are easily realizable in modern device technologies. The performance benefits are significant: average search cost is reduced by 40% or more, while the likelihood of requiring more than one memory operation per search is reduced by several orders of magnitude.

[1]  Andrei Z. Broder,et al.  Using multiple hash functions to improve IP lookups , 2001, Proceedings IEEE INFOCOM 2001. Conference on Computer Communications. Twentieth Annual Joint Conference of the IEEE Computer and Communications Society (Cat. No.01CH37213).

[2]  Larry Carter,et al.  Universal Classes of Hash Functions , 1979, J. Comput. Syst. Sci..

[3]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[4]  Sarang Dharmapurikar,et al.  Longest prefix matching using bloom filters , 2006, IEEE/ACM Transactions on Networking.

[5]  John W. Lockwood,et al.  Architecture for a hardware-based, TCP/IP content-processing system , 2004, IEEE Micro.

[6]  Andrei Z. Broder,et al.  Multilevel adaptive hashing , 1990, SODA '90.

[7]  John W. Lockwood,et al.  Architecture for a hardware based, TCP/IP content scanning system [intrusion detection system applications] , 2003, 11th Symposium on High Performance Interconnects, 2003. Proceedings..

[8]  David Thomas,et al.  The Art in Computer Programming , 2001 .

[9]  Vern Paxson,et al.  Bro: a system for detecting network intruders in real-time , 1998, Comput. Networks.

[10]  Eli Upfal,et al.  Balanced allocations (extended abstract) , 1994, STOC '94.

[11]  Rajmohan Rajaraman,et al.  On contention resolution protocols and associated probabilistic phenomena , 1998, JACM.

[12]  Berthold Vöcking,et al.  How asymmetry helps load balancing , 1999, JACM.

[13]  Kenji Nishida,et al.  Evaluation of Associative Memory Using Parallel Chained Hashing , 1984, IEEE Transactions on Computers.

[14]  Bernhard Plattner,et al.  Scalable high speed IP routing lookups , 1997, SIGCOMM '97.

[15]  Patricio V. Poblete,et al.  Two Hybrid Methods for Collision Resolution in Open Addressing Hashing , 1988, SWAT.

[16]  Gaston H. Gonnet,et al.  Expected Length of the Longest Probe Sequence in Hash Code Searching , 1981, JACM.

[17]  Donald E. Knuth,et al.  The art of computer programming, volume 3: (2nd ed.) sorting and searching , 1998 .

[18]  Per-Åke Larson,et al.  Dynamic hash tables , 1988, CACM.

[19]  Micah Adler,et al.  Parallel randomized load balancing , 1995, STOC '95.

[20]  J. Ian Munro,et al.  Membership in Constant Time and Almost-Minimum Space , 1999, SIAM J. Comput..

[21]  Michael Mitzenmacher,et al.  The Power of Two Choices in Randomized Load Balancing , 2001, IEEE Trans. Parallel Distributed Syst..

[22]  Rasmus Pagh,et al.  Cuckoo Hashing , 2001, Encyclopedia of Algorithms.

[23]  Charles U. Martel,et al.  On efficient unsuccessful search , 1992, SODA '92.

[24]  Li Fan,et al.  Summary cache: a scalable wide-area web cache sharing protocol , 2000, TNET.

[25]  Haoyu Song,et al.  Fast hash table lookup using extended bloom filter: an aid to network processing , 2005, SIGCOMM '05.

[26]  Moni Naor,et al.  Anti-persistence: history independent data structures , 2001, STOC '01.

[27]  János Komlós,et al.  Storing a sparse table with O(1) worst case access time , 1982, 23rd Annual Symposium on Foundations of Computer Science (sfcs 1982).

[28]  Eli Upfal,et al.  Balanced Allocations , 1999, SIAM J. Comput..

[29]  Tetsuo Ida,et al.  PARAbLEL HASHING , 2001 .

[30]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[31]  George S. Lueker,et al.  More analysis of double hashing , 1988, STOC '88.