A Dynamic Hash Table for the GPU

We design and implement a fully concurrent dynamic hash table for GPUs with comparable performance to the state of the art static hash tables. We propose a warp-cooperative work sharing strategy that reduces branch divergence and provides an efficient alternative to the traditional way of per-thread (or per-warp) work assignment and processing. By using this strategy, we build a dynamic non-blocking concurrent linked list, the slab list, that supports asynchronous, concurrent updates (insertions and deletions) as well as search queries. We use the slab list to implement a dynamic hash table with chaining (the slab hash). On an NVIDIA Tesla K40c GPU, the slab hash performs updates with up to 512 M updates/s and processes search queries with up to 937 M queries/s. We also design a warp-synchronous dynamic memory allocator, SlabAlloc, that suits the high performance needs of the slab hash. SlabAlloc dynamically allocates memory at a rate of 600 M allocations/s, which is up to 37x faster than alternative methods in similar scenarios.

[1]  Sylvain Lefebvre,et al.  Coherent parallel hashing , 2011, ACM Trans. Graph..

[2]  Ulrich Meyer,et al.  GPU multisplit , 2016, PPoPP.

[3]  John D. Owens,et al.  Real-time parallel hashing on the GPU , 2009, SIGGRAPH 2009.

[4]  Nina Amenta,et al.  Efficient hash tables on the gpu , 2011 .

[5]  Laxmi N. Bhuyan,et al.  Stadium Hashing: Scalable and Flexible Hashing on GPUs , 2015, 2015 International Conference on Parallel Architecture and Compilation (PACT).

[6]  Ronald L. Rivest,et al.  Introduction to Algorithms, Second Edition , 2001 .

[7]  Vlastimil Havran,et al.  Register Efficient Dynamic Memory Allocator for GPUs , 2015, Comput. Graph. Forum.

[8]  David A. Bader,et al.  cuSTINGER: Supporting dynamic graph algorithms for GPUs , 2016, 2016 IEEE High Performance Extreme Computing Conference (HPEC).

[9]  Erez Petrank,et al.  Locality-Conscious Lock-Free Linked Lists , 2011, ICDCN.

[10]  Maged M. Michael,et al.  High performance dynamic lock-free hash tables and list-based sets , 2002, SPAA '02.

[11]  Richard Cole,et al.  Scanning and Traversing: Maintaining Data for Traversals in a Memory Hierarchy , 2002, ESA.

[12]  Prabhakar Misra,et al.  Performance Evaluation of Concurrent Lock-free Data Structures on GPUs , 2012, 2012 IEEE 18th International Conference on Parallel and Distributed Systems.