Analysis of the Search Performance of Coalesced Hashing

An analysis is presented of the coalesced hashing method, m which a portion of memory (called the address region) serves as the range of the hash function while the rest of memory (called the cellar) Is devoted solely to storing records that collide when inserted. If the cellar should get full, subsequent colliders must be stored in empty slots in the address region and thus may cause later collisions. Varying the relative size of the cellar affects search performance. The main result of this paper expresses the average search tunes as a function of the number of records and the cellar size, solving a long-standing open problem. These formulas are used to pick the cellar size that leads to optimum search performance, and tt is shown that this "tuned" method outperforms several well-known hashing schemes A discussion of past work on coalesced hashing and a generalization of the method to nonuniform hash functions conclude the paper Categories and SubJect Descriptors E 2 [Data]. Data Storage Representations--hash-table representations; F 2 2 [Analysis of Algorithms and Problem Complexity]Nonnumencal Algorithms and Problems--sorting and searching; G 2 1 [Discrete Mathematics] Combinatoncs--generating functions', permutations and combinations, recurrences and difference equauons; G.3 [Mathematics of Computing]: Probability and Statistlcs--i andom number generat:on, H 3 3 [Information Storage and Retrieval]. Information Search and Retrieval--search process General Terms. Algorithms, Performance, Theory Additional

[1]  Jeffrey Scott Vitter,et al.  Analysis of coalesced hashing , 1980 .

[2]  Larry Carter,et al.  Universal Classes of Hash Functions , 1979, J. Comput. Syst. Sci..

[3]  Jeffrey Scott Vitter Implementations for coalesced hashing , 1982, CACM.

[4]  R. Gallager Information Theory and Reliable Communication , 1968 .

[5]  Gio Wiederhold,et al.  Database Design , 1977 .

[6]  Jeffrey D. Ullman,et al.  Principles of Database Systems , 1980 .

[7]  Alfred V. Aho,et al.  Principles of Compiler Design (Addison-Wesley series in computer science and information processing) , 1977 .

[8]  D. Knuth,et al.  Mathematics for the Analysis of Algorithms , 1999 .

[9]  William Feller,et al.  An Introduction to Probability Theory and Its Applications , 1951 .

[10]  M. AdelsonVelskii,et al.  AN ALGORITHM FOR THE ORGANIZATION OF INFORMATION , 1963 .

[11]  Jeffrey Scott Vitter,et al.  Tuning the coalesced hashing method to obtain optimum performance , 1980, 21st Annual Symposium on Foundations of Computer Science (sfcs 1980).

[12]  I. Weiss Limiting Distributions in Some Occupancy Problems , 1958 .

[13]  Jeffrey Scott Vitter,et al.  Addendum to "Analysis of Some New Variants of Coalesced Hashing" , 1985, ACM Trans. Database Syst..

[14]  Peter J. Denning,et al.  Properties of the working-set model , 1972, CACM.

[15]  Leonidas J. Guibas,et al.  The analysis of hashing algorithms , 1976 .

[16]  Donald Ervin Knuth,et al.  The Art of Computer Programming , 1968 .

[17]  Francis A. Williams Handling identifies as internal symbols in language processors , 1959, CACM.

[18]  D. A. Bell,et al.  Information Theory and Reliable Communication , 1969 .

[19]  Alfred V. Aho,et al.  The Design and Analysis of Computer Algorithms , 1974 .

[20]  Samuel Kotz,et al.  Urn Models and Their Applications: An Approach to Modern Discrete Probability Theory , 1978, The Mathematical Gazette.

[21]  S. Schwartz,et al.  Properties of the working-set model , 1972, OPSR.

[22]  Donald E. Knuth,et al.  The Art of Computer Programming, Vol. 3: Sorting and Searching , 1974 .

[23]  Leonidas J. Guibas,et al.  A dichromatic framework for balanced trees , 1978, 19th Annual Symposium on Foundations of Computer Science (sfcs 1978).

[24]  Feller William,et al.  An Introduction To Probability Theory And Its Applications , 1950 .

[25]  Leonidas J. Guibas,et al.  The Analysis of Double Hashing , 1978, J. Comput. Syst. Sci..