An analysis on the performance of hash table-based dictionary implementations with different data usage models

The efficiency of in-memory computing applications depends on the choice of mechanism to store and retrieve strings. The tree and trie are the abstract data types (ADTs) that offer better efficiency for ordered dictionary. Hash table is one among the several other ADTs that provides efficient implementation for unordered dictionary. The performance of a data structure will depend on hardware capabilities of computing devices such as RAM size, cache memory size and even the speed of the physical storage media. Hence, an application which will be running on real or virtualised hardware environment certainly will have restricted access to memory and hashing is heavily used for such applications for speedy process. In this work, an analysis on the performance of six hash table based dictionary ADT implementations with different data usage models is carried out. The six different popular hash table based dictionary ADT implementations are Khash, Uthash, GoogleDenseHash, TommyHashtable, TommyHashdyn and TommyHashlin, tested under different hardware and software configurations.

[1]  M. Thenmozhi,et al.  An Analysis on the Performance of Tree and Trie Based Dictionary Implementations with Different Data Usage Models , 2015 .

[2]  Hugh E. Williams,et al.  In-memory hash tables for accumulating text vocabularies , 2001, Inf. Process. Lett..

[3]  Hugh E. Williams,et al.  Self‐adjusting trees in practice for large text collections , 2001, Softw. Pract. Exp..

[4]  Justin Zobel,et al.  B-tries for disk-based string management , 2008, The VLDB Journal.

[5]  Zhiyuan Li,et al.  A critical comparative evaluation on DHT-based peer-to-peer search algorithms , 2014, Int. J. Embed. Syst..

[6]  Michael Rodeh,et al.  Virtual Cache Line: A New Technique to Improve Cache Exploitation for Recursive Data Structures , 1999, CC.

[7]  James R. Larus,et al.  Cache-conscious data structures: design and implementation , 1999 .

[8]  William Pugh,et al.  Skip Lists: A Probabilistic Alternative to Balanced Trees , 1989, WADS.

[9]  Nikolas Askitis,et al.  Fast and Compact Hash Tables for Integer Keys , 2009, ACSC.

[10]  Wanlei Zhou,et al.  Reducing the bandwidth requirements of P2P keyword indexing , 2009, Int. J. High Perform. Comput. Netw..

[11]  Kenneth A. Ross,et al.  Making B+- trees cache conscious in main memory , 2000, SIGMOD '00.

[12]  Justin Zobel,et al.  Redesigning the string hash table, burst trie, and BST to exploit cache , 2011, JEAL.

[13]  Dan Feng,et al.  A study on disk index design for large scale de-duplication storage systems , 2015, Int. J. Comput. Sci. Eng..

[14]  Jim Bell,et al.  An evaluation of self‐adjusting binary search tree techniques , 1993, Softw. Pract. Exp..