The Analysis of Double Hashing

Abstract In this paper we analyze the performance of double hashing , a well-known hashing algorithm in which we probe the hash table along arithmetic progressions where the initial element and the increment of the progression are chosen randomly and independently depending only on the key K of the search. We prove that double hashing is asymptotically equivalent to uniform probing for load factors α not exceeding a certain constant α 0 = 0.31…. Uniform hashing refers to a technique which exhibits no clustering and is known to be optimal in a certain sense. Our proof method has a different flavor from those previously used in algorithmic analysis. We begin by showing that the tail of the hypergeometric distribution a fixed percentage away from the mean is exponentially small. We use this result to prove that random subsets of the finite ring of integers modulo m of cardinality am have always nearly the expected number of arithmetic progressions of length k , except with exponentially small probability. We then use this theorem to start up a process (called the extension process) of looking at snapshorts of the table as it fills up with double hashing. Between steps of the extension process we can show that the effect of clustering is negligible, and that we therefore never depart too far from the truly random situation.