Building an Efficient Hash Table on the GPU

Publisher Summary This chapter describes a straightforward algorithm for parallel hash table construction on the graphical processing unit (GPU). It constructs the table in global memory and use atomic operations to detect and resolve collisions. Construction and retrieval performance are limited almost entirely by the time required for these uncoalesced memory accesses, which are linear in the total number of accesses; so the design goal is to minimize the average number of accesses per insertion or lookup. In fact, it guarantees a constant worst-case bound on the number of accesses per lookup. Further, one alternative to using a hash table is to store the data in a sorted array and access it via binary search. Sorted arrays can be built very quickly using radix sort because the memory access pattern of radix sort is very localized, allowing the GPU to coalesce many memory accesses and reduce their cost significantly. However, binary search, which incurs as many as lg ( N) probes in the worst case, is much less efficient than hash table lookup. GPU hash tables are useful for interactive graphics applications, where they are used to store sparse spatial data—usually 3D models that are voxelized on a uniform grid. Rather than store the entire voxel grid, which is mostly empty, a hash table is built to hold just the occupied voxels.

[1]  John D. Owens,et al.  Real-time parallel hashing on the GPU , 2009, SIGGRAPH 2009.

[2]  Sylvain Lefebvre,et al.  Perfect spatial hashing , 2006, SIGGRAPH 2006.

[3]  Andrea Montanari,et al.  Tight Thresholds for Cuckoo Hashing via XORSAT , 2009, ICALP.

[4]  Kai-Min Chung,et al.  Why simple hash functions work: exploiting the entropy in a data stream , 2008, SODA '08.

[5]  Andrew S. Grimshaw,et al.  Revisiting sorting for GPGPU stream architectures , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[6]  Rasmus Pagh,et al.  Cuckoo Hashing , 2001, Encyclopedia of Algorithms.