Multiattribute hashing using Gray codes

Multiattribute hashing and its variations have been proposed for partial match and range queries in the past. The main idea is that each record yields a bitstring @@@@ (“record signature”), according to the values of its attributes. The binary value (@@@@)2 of this string decides the bucket that the record is stored. In this paper we propose to use Gray codes instead of binary codes, in order to map record signatures to buckets. In Gray codes, successive codewords differ in the value of exactly one bit position, thus, successive buckets hold records with similar record signatures. The proposed method achieves better clustering of similar records and avoids some of the (expensive) random disk accesses, replacing them with sequential ones. We develop a mathematical model, derive formulas giving the average performance of both methods and show that the proposed method achieves 0% - 50% relative savings over the binary codes. We also discuss how Gray codes could be applied to some retrieval methods designed for range queries, such as the grid file [Nievergelt84a] and the approach based on the so-called z-ordering [Orenstein84a].

[1]  Kotagiri Ramamohanarao,et al.  Partial-match retrieval using hashing and descriptors , 1983, TODS.

[2]  Kotagiri Ramamohanarao,et al.  Partial-match retrieval for dynamic files , 1982, BIT.

[3]  Per-Åke Larson,et al.  Performance analysis of linear hashing with partial expansions , 1982, TODS.

[4]  Ronald L. Rivest,et al.  Partial-Match Retrieval Algorithms , 1976, SIAM J. Comput..

[5]  Alfred V. Aho,et al.  Optimal partial-match retrieval when fields are independently specified , 1979, ACM Trans. Database Syst..

[6]  John W. Lloyd Optimal partial-match retrieval , 1980, BIT Comput. Sci. Sect..

[7]  Ronald Fagin,et al.  Extendible hashing—a fast access method for dynamic files , 1979, ACM Trans. Database Syst..

[8]  G. N.N. Martin,et al.  Spiral Storage: Incrementally Augmentable Hash Addressed Storage , 1979 .

[9]  Jon Louis Bentley,et al.  Multidimensional binary search trees used for associative searching , 1975, CACM.

[10]  Witold Litwin,et al.  Linear Hashing: A new Algorithm for Files and Tables Addressing , 1980, ICOD.

[11]  Jürg Nievergelt,et al.  The Grid File: An Adaptable, Symmetric Multikey File Structure , 1984, TODS.

[12]  J. T. Robinson,et al.  The K-D-B-tree: a search structure for large multidimensional dynamic indexes , 1981, SIGMOD '81.

[13]  T. H. Merrett,et al.  A class of data structures for associative searching , 1984, PODS.

[14]  James B. Rothnie,et al.  Attribute based file organization in a paged memory environment , 1974, CACM.

[15]  Alfonso F. Cardenas Analysis and performance of inverted data base structures , 1975, CACM.

[16]  E. Gilbert Gray codes and paths on the N-cube , 1958 .