Cuckoo++ hash tables: high-performance hash tables for networking applications

Hash tables are essential data-structures for networking applications (e.g., connection tracking, firewalls, network address translators). Among these, cuckoo hash tables provide excellent performance by processing lookups with very few memory accesses (2 to 3 per lookup). Yet, they remain memory bound and each memory access impacts performance. In this paper, we propose algorithmic improvements to cuckoo hash tables to eliminate unnecessary memory accesses, without altering the properties of the original cuckoo hash table so that all existing theoretical analysis remain applicable. We also present an implementation tailored to run efficiently on Intel Xeon processors, thus supporting NFV and softwarization trends and compare it to the optimized implementation of DPDK. On a single core, our implementation achieves 37M positive lookups per second (i.e., when the key looked up is present in the table), and 60M negative lookups per second, a 45% to 70% improvement over DPDK.

[1]  Xiaozhou Li,et al.  Algorithmic improvements for fast concurrent Cuckoo hashing , 2014, EuroSys '14.

[2]  Kenneth A. Ross Efficient Hash Probes on Modern Processors , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[3]  Dong Zhou,et al.  Scaling Up Clustered Network Appliances with ScaleBricks , 2015, SIGCOMM.

[4]  Martin Dietzfelbinger,et al.  Balanced allocation and dictionaries with tightly packed constant size bins , 2005, Theor. Comput. Sci..

[5]  Pavel Celeda,et al.  Network traffic characterisation using flow-based statistics , 2016, NOMS 2016 - 2016 IEEE/IFIP Network Operations and Management Symposium.

[6]  Rasmus Pagh,et al.  Cuckoo Hashing , 2001, Encyclopedia of Algorithms.

[7]  Scott Shenker,et al.  E2: a framework for NFV applications , 2015, SOSP.

[8]  David Clark,et al.  The Morgan Kaufmann Series in Networking , 2008 .

[9]  Scott Shenker,et al.  Elastic Scaling of Stateful Network Functions , 2018, NSDI.

[10]  Maurice Herlihy,et al.  Hopscotch Hashing , 2008, DISC.

[11]  Li Fan,et al.  Summary cache: a scalable wide-area web cache sharing protocol , 2000, TNET.

[12]  Hyeontaek Lim,et al.  MICA: A Holistic Approach to Fast In-Memory Key-Value Storage , 2014, NSDI.

[13]  James Won-Ki Hong,et al.  Characteristic analysis of internet traffic from the perspective of flows , 2006, Comput. Commun..

[14]  Salvatore Pontarelli,et al.  EMOMA: Exact Match in One Memory Access , 2017, IEEE Transactions on Knowledge and Data Engineering.

[15]  Scott Shenker,et al.  NetBricks: Taking the V out of NFV , 2016, OSDI.

[16]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[17]  John V. Franco,et al.  The analysis of hashing with lazy deletions , 1992, Inf. Sci..

[18]  Bin Fan,et al.  MemC3: Compact and Concurrent MemCache with Dumber Caching and Smarter Hashing , 2013, NSDI.

[19]  Dean M. Tullsen,et al.  Horton Tables: Fast Hash Tables for In-Memory Data-Intensive Computing , 2016, USENIX Annual Technical Conference.

[20]  Bin Fan,et al.  Cuckoo Filter: Practically Better Than Bloom , 2014, CoNEXT.

[21]  Stefano Giordano,et al.  Network Traffic Processing With PFQ , 2016, IEEE Journal on Selected Areas in Communications.

[22]  Rina Panigrahy,et al.  Efficient hashing with lookups in two memory accesses , 2004, SODA '05.

[23]  Brian E. Carpenter,et al.  A flow-based performance analysis of TCP and TCP applications , 2012, 2012 18th IEEE International Conference on Networks (ICON).

[24]  Dong Zhou,et al.  Scalable, high performance ethernet forwarding with CuckooSwitch , 2013, CoNEXT.

[25]  R. Wilder,et al.  Wide-area Internet traffic patterns and characteristics , 1997, IEEE Netw..

[26]  Fabien André,et al.  Don't share, Don't lock: Large-scale Software Connection Tracking with Krononat , 2018, USENIX Annual Technical Conference.

[27]  J. Ian Munro,et al.  Robin hood hashing , 1985, 26th Annual Symposium on Foundations of Computer Science (sfcs 1985).

[28]  Haoyu Song,et al.  Fast hash table lookup using extended bloom filter: an aid to network processing , 2005, SIGCOMM '05.

[29]  Katerina J. Argyraki,et al.  RouteBricks: exploiting parallelism to scale software routers , 2009, SOSP '09.

[30]  Marco Mellia,et al.  Statistical network monitoring: Methodology and application to carrier-grade NAT , 2016, Comput. Networks.

[31]  Yuan Yuan,et al.  Mega-KV: A Case for GPUs to Maximize the Throughput of In-Memory Key-Value Stores , 2015, Proc. VLDB Endow..

[32]  George Varghese,et al.  Network Algorithmics-An Interdisciplinary Approach to Designing Fast Networked Devices , 2004 .

[33]  Luigi Rizzo,et al.  netmap: A Novel Framework for Fast Packet I/O , 2012, USENIX ATC.

[34]  Anthony Lauck,et al.  Hashed and hierarchical timing wheels: data structures for the efficient implementation of a timer facility , 1987, SOSP '87.