Efficient Hash Probes on Modern Processors

Bucketized versions of Cuckoo hashing can achieve 95-99% occupancy, without any space overhead for pointers or other structures. However, such methods typically need to consult multiple hash buckets per probe, and have therefore been seen as having worse probe performance than conventional techniques for large tables. We consider workloads typical of database and stream processing, in which keys and payloads are small, and in which a large number of probes are processed in bulk. We show how to improve probe performance by (a) eliminating branch instructions from the probe code, enabling better scheduling and latency-hiding by modern processors, and (b) using SIMD instructions to process multiple keys/payloads in parallel. We show that on modern architectures, probes to a bucketized Cuckoo hash table can be processed much faster than conventional hash table probes, for both small and large memory-resident tables. On a Pentium 4, a probe is two to four times faster, while on the Cell SPE processor a probe is ten times faster.

[1]  Martti Penttonen,et al.  A Reliable Randomized Algorithm for the Closest-Pair Problem , 1997, J. Algorithms.

[2]  Donald E. Knuth,et al.  The art of computer programming: sorting and searching (volume 3) , 1973 .

[3]  Marcin Zukowski,et al.  Architecture-conscious hashing , 2006, DaMoN '06.

[4]  H. Peter Hofstee,et al.  Introduction to the Cell multiprocessor , 2005, IBM J. Res. Dev..

[5]  Mikkel Thorup Even strongly universal hashing is pretty fast , 2000, SODA '00.

[6]  Paul G. Spirakis,et al.  Space Efficient Hash Tables with Worst Case Constant Access Time , 2003, Theory of Computing Systems.

[7]  Rasmus Pagh,et al.  Cuckoo Hashing , 2001, Encyclopedia of Algorithms.

[8]  Donald Ervin Knuth,et al.  The Art of Computer Programming , 1968 .

[9]  Kenneth A. Ross,et al.  Implementing database operations using SIMD instructions , 2002, SIGMOD '02.

[10]  Martin Dietzfelbinger,et al.  Almost random graphs with simple hash functions , 2003, STOC '03.

[11]  Rina Panigrahy,et al.  Efficient hashing with lookups in two memory accesses , 2004, SODA '05.

[12]  Úlfar Erlingsson,et al.  A cool and practical alternative to traditional hash tables , 2006 .

[13]  Martin Dietzfelbinger,et al.  Balanced allocation and dictionaries with tightly packed constant size bins , 2005, Theor. Comput. Sci..

[14]  Donald E. Knuth,et al.  The art of computer programming, volume 3: (2nd ed.) sorting and searching , 1998 .

[15]  Kenneth A. Ross,et al.  Improving Database Performance on Simultaneous Multithreading Processors , 2005, VLDB.