论文信息 - Single Hash: Use One Hash Function to Build Faster Hash Based Data Structures

Single Hash: Use One Hash Function to Build Faster Hash Based Data Structures

With the scale of data to store or monitor in nowadays network constantly increasing, hash based data structures are more and more widely used because of their high memory et1iciency and high speed. Most of them, like Bloom filters, sketchesand d-Iefthash tables use more than one hash function. Furthermore, in order to achieve good randomicity, the hash functions used, like MD5 and SHA1, are very complicated and consumea lot of CPU cyclesto carry out. As a consequence, the implementation of these hash functions will be time-consuming, In order to address this issue, wepropose SingleHash technique in this paper. It is based on the observation that the hash functions we use produce 32-bit or M-bit values which have much bigger value ranges than that we need in practice. We usually have to carry out modular operation to map the hash results into a smaller range in the data structures listed above. In this procedure, information carried by the high bits may be discarded. For example, if in a Bloom filter the length of the bit array is 220 while the hash functions we use are 32-bit hash functions, there are 12 bits in the results of the hash functions discarded in the procedure of modular. We can use these bits to produce more hash values. Therefore, we propose to use a few bit operations to make full use of the information produced by one hash function and generate multiple hash values which can be used in these data structures. SingleHash technique can be applied to most of the hash based data structures. It can significantly improve their speed, because instead of carrying out multiple hash functions, we only need to compute one hash function and a few simple operations (e.g., bit shift and XOR). Other aspects of performance, likememoryefficiency and accuracy of these data structures willnot be influenced by Single Hash technique. In this paper, weapply it to three kinds of classic hash based data structures, i.e., Bloomfilters, CM sketches and d-Iefthash tables as case studies, and evaluate their performance with both mathematical analysis and extensive experiments. We make all our codes open source on Github.

[1] Cristian Estan,et al. New directions in traffic measurement and accounting , 2001, IMW '01.

[2] Moses Charikar,et al. Finding frequent items in data streams , 2002, Theor. Comput. Sci..

[3] Gaogang Xie,et al. A Shifting Bloom Filter Framework for Set Queries , 2015, Proc. VLDB Endow..

[4] Burton H. Bloom,et al. Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[5] Jin Cao,et al. Tracking Long Duration Flows in Network Traffic , 2010, 2010 Proceedings IEEE INFOCOM.

[6] Berthold Vöcking,et al. How asymmetry helps load balancing , 1999, JACM.

[7] David E. Taylor,et al. Longest prefix matching using bloom filters , 2006, TNET.

[8] Shigang Chen,et al. When Bloom Filters Are No Longer Compact: Multi-Set Membership Lookup for Network Applications , 2016, IEEE/ACM Transactions on Networking.

[9] Tong Yang,et al. A Shifting Framework for Set Queries , 2017, IEEE/ACM Transactions on Networking.

[10] Li Fan,et al. Summary cache: a scalable wide-area web cache sharing protocol , 2000, TNET.

[11] Tong Yang,et al. Pyramid Sketch: a Sketch Framework for Frequency Estimation of Data Streams , 2017, Proc. VLDB Endow..

[12] Ashwin Lall,et al. Streaming Pointwise Mutual Information , 2009, NIPS.

[13] Bernard Chazelle,et al. The Bloomier filter: an efficient data structure for static support lookup tables , 2004, SODA '04.

[14] Miles Osborne,et al. Smoothed Bloom Filter Language Models: Tera-Scale LMs on the Cheap , 2007, EMNLP.

[15] Josep-Lluís Larriba-Pey,et al. Dynamic count filters , 2006, SGMD.

[16] Graham Cormode,et al. An improved data stream summary: the count-min sketch and its applications , 2004, J. Algorithms.

[17] Carsten Lund,et al. Online identification of hierarchical heavy hitters: algorithms, evaluation, and applications , 2004, IMC '04.

[18] Andrei Z. Broder,et al. Using multiple hash functions to improve IP lookups , 2001, Proceedings IEEE INFOCOM 2001. Conference on Computer Communications. Twentieth Annual Joint Conference of the IEEE Computer and Communications Society (Cat. No.01CH37213).

[19] Michael Mitzenmacher,et al. Less Hashing, Same Performance: Building a Better Bloom Filter , 2006, ESA.

[20] Minlan Yu,et al. Cold Filter: A Meta-Framework for Faster and More Accurate Stream Processing , 2018, SIGMOD Conference.

[21] Charu C. Aggarwal,et al. gSketch: On Query Estimation in Graph Streams , 2011, Proc. VLDB Endow..

[22] Rasmus Pagh,et al. Optimality in External Memory Hashing , 2007, Algorithmica.

[23] Graham Cormode,et al. Sketching Streams Through the Net: Distributed Approximate Query Tracking , 2005, VLDB.

[24] Qing Chen,et al. Graph Stream Summarization: From Big Bang to Big Crunch , 2016, SIGMOD Conference.

[25] M.E. Locasto,et al. Towards collaborative security and P2P intrusion detection , 2005, Proceedings from the Sixth Annual IEEE SMC Information Assurance Workshop.

[26] Hector Garcia-Molina,et al. DHT Routing Using Social Links , 2004, IPTPS.