Consider the set <inline-equation> <f> <sc>H</sc> </f> </inline-equation> of all linear (or affine) transformations between two vector spaces over a finite field <italic>F</italic>. We study how good <inline-equation> <f> <sc>H</sc></f></inline-equation> is as a class of hash functions, namely we consider hashing a set <italic>S</italic> of size <italic>n</italic> into a range having the same cardinality <italic>n</italic> by a randomly chosen function from <inline-equation> <f> <sc>H</sc></f></inline-equation> and look at the expected size of the largest hash bucket. <inline-equation> <f> <sc>H</sc></f></inline-equation> is a universal class of hash functions for any finite field, but with respect to our measure different fields behave differently.
If the finite field <italic>F</italic> has <italic>n</italic> elements, then there is a bad set <italic>S</italic> <inline-equation> <f> ⊂</f></inline-equation> <italic>F</italic><supscrpt>2</supscrpt> of size <italic>n</italic> with expected maximal bucket size <inline-equation> <f> <sc>H</sc></f></inline-equation>(<italic>n</italic><supscrpt>1/3</supscrpt>). If <italic>n</italic> is a perfect square, then there is even a bad set with largest bucket size <italic>always</italic> at least <inline-equation> <f> <rad> <rcd>n</rcd></rad></f></inline-equation>. (This is worst possible, since with respect to a universal class of hash functions every set of size <italic>n</italic> has expected largest bucket size below <inline-equation> <f> <rad> <rcd>n</rcd></rad></f></inline-equation> + 1/2.)
If, however, we consider the field of two elements, then we get much better bounds. The best previously known upper bound on the expected size of the largest bucket for this class was <italic>O</italic>(2<supscrpt><inline-equation> <f> <rad> <rcd>log n</rcd></rad></f></inline-equation></supscrpt>). We reduce this upper bound to <italic>O</italic>(log <italic>n</italic> log log<italic>n</italic>). Note that this is not far from the guarantee for a random function. There, the average largest bucket would be &THgr;(log <italic>n</italic>/ log log <italic>n</italic>).
In the course of our proof we develop a tool which may be of independent interest. Suppose we have a subset <italic>S</italic> of a vector space <italic>D</italic> over <bold>Z</bold><subscrpt>2</subscrpt>, and consider a random linear mapping of <italic>D</italic> to a smaller vector space <italic>R</italic>. If the cardinality of <italic>S</italic> is larger than <italic>c</italic><subscrpt>ε</subscrpt>|<italic>R</italic>|log|<italic>R</italic>|, then with probability 1 - ε, the image of <italic>S</italic> will cover all elements in the range.
[1]
Martti Penttonen,et al.
A Reliable Randomized Algorithm for the Closest-Pair Problem
,
1997,
J. Algorithms.
[2]
J. Pach,et al.
Combinatorial geometry
,
1995,
Wiley-Interscience series in discrete mathematics and optimization.
[3]
Rajeev Raman,et al.
Sorting in linear time?
,
1995,
STOC '95.
[4]
Yossi Matias,et al.
Polynomial Hash Functions Are Reliable (Extended Abstract)
,
1992,
ICALP.
[5]
G. H. Gonnet,et al.
Handbook of algorithms and data structures: in Pascal and C (2nd ed.)
,
1991
.
[6]
Noam Nisan,et al.
The computational complexity of universal hashing
,
1990,
Proceedings Fifth Annual Structure in Complexity Theory Conference.
[7]
Alan Siegel,et al.
On universal classes of fast high performance hash functions, their time-space tradeoff, and their applications
,
1989,
30th Annual Symposium on Foundations of Computer Science.
[8]
Friedhelm Meyer auf der Heide,et al.
Dynamic perfect hashing: upper and lower bounds
,
1988,
[Proceedings 1988] 29th Annual Symposium on Foundations of Computer Science.
[9]
Noga Alon,et al.
A Fast and Simple Randomized Parallel Algorithm for the Maximal Independent Set Problem
,
1985,
J. Algorithms.
[10]
János Komlós,et al.
Storing a sparse table with O(1) worst case access time
,
1982,
23rd Annual Symposium on Foundations of Computer Science (sfcs 1982).
[11]
Larry Carter,et al.
Universal Classes of Hash Functions
,
1979,
J. Comput. Syst. Sci..
[12]
Larry Carter,et al.
Analysis of a Universal Class of Hash Functions
,
1978,
MFCS.
[13]
Friedhelm Meyer auf der Heide,et al.
Dynamic Hashing in Real Time
,
1992,
Informatik.
[14]
S. Graham,et al.
Lower Bounds for Least Quadratic Non-Residues
,
1990
.
[15]
Gaston H. Gonnet,et al.
Handbook Of Algorithms And Data Structures
,
1984
.
[16]
Vladimir Vapnik,et al.
Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities
,
1971
.