A Distributed Hash Table for Shared Memory

Distributed algorithms for graph searching require a high-performance CPU-efficient hash table that supports find-or-put. This operation either inserts data or indicates that it has already been added before. This paper focuses on the design and evaluation of such a hash table, targeting supercomputers. The latency of find-or-put is minimized by using one-sided RDMA operations. These operations are overlapped as much as possible to reduce waiting times for roundtrips. In contrast to existing work, we use linear probing and argue that this requires less roundtrips. The hash table is implemented in UPC. A peak-throughput of 114.9 million op/s is reached on an Infiniband cluster. With a load-factor of 0.9, find-or-put can be performed in 4.5μs on average. The hash table performance remains very high, even under high loads.

[1]  Maurice Herlihy,et al.  Hopscotch Hashing , 2008, DISC.

[2]  George Almási,et al.  Scalable RDMA performance in PGAS languages , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[3]  Barbara M. Chapman,et al.  Introducing OpenSHMEM: SHMEM for the PGAS community , 2010, PGAS '10.

[4]  Tarek A. El-Ghazawi,et al.  UPC: unified parallel C , 2006, SC.

[5]  Kenneth A. Ross Efficient Hash Probes on Modern Processors , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[6]  Ronald L. Rivest,et al.  Introduction to Algorithms, third edition , 2009 .

[7]  David G. Andersen,et al.  Using RDMA efficiently for key-value services , 2015, SIGCOMM 2015.

[8]  Donald E. Knuth,et al.  The Art of Computer Programming: Volume 3: Sorting and Searching , 1998 .

[9]  Jinyang Li,et al.  Using One-Sided RDMA Reads to Build a Fast, CPU-Efficient Key-Value Store , 2013, USENIX ATC.

[10]  Tom van Dijk,et al.  Sylvan: multi-core decision diagrams , 2015, TACAS.

[11]  Anna Philippou,et al.  Tools and Algorithms for the Construction and Analysis of Systems , 2018, Lecture Notes in Computer Science.

[12]  Miguel Castro,et al.  FaRM: Fast Remote Memory , 2014, NSDI.

[13]  Rasmus Pagh,et al.  Cuckoo Hashing , 2001, Encyclopedia of Algorithms.

[14]  Xin-She Yang,et al.  Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.

[15]  Alfons Laarman,et al.  Boosting multi-core reachability performance with shared hash tables , 2010, Formal Methods in Computer Aided Design.

[16]  David Thomas,et al.  The Art in Computer Programming , 2001 .

[17]  Mendel Rosenblum,et al.  It's Time for Low Latency , 2011, HotOS.

[18]  Ben Cassell Designing A Low-Latency Cuckoo Hash Table for Write-Intensive Workloads Using RDMA , 2014 .