Linear Hashing

Consider the set ${\cal H}$ of all linear (or affine) transf ormations between two vector spaces over a finite field $F$. We study how good $\cal H$ is as a class of hash functions, namely we consider hashing a set $S$ of size $n$ into a range having the same cardinali ty $n$ by a randomly chosen function from ${\cal H}$ and look at the expected size of the largest hash bucket. $\cal H$ is a universal class of hash functions for any fini te field, but with respect to our measure different fields behave differen tly. \par If the finite field $F$ has $n$ elements then there is a bad set $S\subset F^2$ of size $n$ with expected maximal bucket size $\Omega(n^{1/3})$. If $n$ is a perfect square then there is even a bad set with largest bucket size {\em always} at least $\sqrt n$. (This is worst possible, since with respect to a universal class of hash functions every set of size $n$ has expected largest bucket size below $\sqrt n+1/2$.) \par If, however, we consider the field of two elements then we get much better bounds. The best previously known upper bound on the expected size of the largest bucket for this class was $O( 2^{\sqrt{\log n}})$. We reduce this upper bound to $O(\log n\log\log n)$. Note that this is not far from the guarantee for a random function. There, the average largest bucket would be $\Theta(\log n/\log \log n)$. \par In the course of our proof we develop a tool which may be of independent interest. Suppose we have a subset $S$ of a vector space $D$ over ${\bf Z}_2$, and consider a random linear mapping of $D$ to a smaller vector space $R$. If the cardinality of $S$ is larger than $c_\e|R|\log|R|$ then with probability $1-\e$, the image of $S$ will cover all elements in the range.

[1]  Larry Carter,et al.  Analysis of a Universal Class of Hash Functions , 1978, MFCS.

[2]  Larry Carter,et al.  Universal Classes of Hash Functions , 1979, J. Comput. Syst. Sci..

[3]  Witold Litwin,et al.  Linear Hashing: A new Algorithm for Files and Tables Addressing , 1980, ICOD.

[4]  János Komlós,et al.  Storing a sparse table with O(1) worst case access time , 1982, 23rd Annual Symposium on Foundations of Computer Science (sfcs 1982).

[5]  János Komlós,et al.  Storing a sparse table with O(1) worst case access time , 1982, 23rd Annual Symposium on Foundations of Computer Science (sfcs 1982).

[6]  Per-Åke Larson,et al.  Performance analysis of linear hashing with partial expansions , 1982, TODS.

[7]  Gaston H. Gonnet,et al.  Handbook Of Algorithms And Data Structures , 1984 .

[8]  VishkinUzi,et al.  Randomized and deterministic simulations of PRAMs by parallel machines with restricted granularity of parallel memories , 1984 .

[9]  Per-Åke Larson,et al.  Linear hashing with overflow-handling by linear probing , 1985, TODS.

[10]  Noga Alon,et al.  A Fast and Simple Randomized Parallel Algorithm for the Maximal Independent Set Problem , 1985, J. Algorithms.

[11]  Noga Alon,et al.  On Disseminating Information Reliably without Broadcasting , 1987, ICDCS.

[12]  Friedhelm Meyer auf der Heide,et al.  Dynamic perfect hashing: upper and lower bounds , 1988, [Proceedings 1988] 29th Annual Symposium on Foundations of Computer Science.

[13]  Per-Åke Larson,et al.  Dynamic hash tables , 1988, CACM.

[14]  Alan Siegel,et al.  On universal classes of fast high performance hash functions, their time-space tradeoff, and their applications , 1989, 30th Annual Symposium on Foundations of Computer Science.

[15]  S. Graham,et al.  Lower Bounds for Least Quadratic Non-Residues , 1990 .

[16]  David J. DeWitt,et al.  Tradeoffs in Processing Complex Join Queries via Hashing in Multiprocessor Database Machines , 1990, VLDB.

[17]  Friedhelm Meyer auf der Heide,et al.  A New Universal Class of Hash Functions and Dynamic Hashing in Real Time , 1990, ICALP.

[18]  Noam Nisan,et al.  The computational complexity of universal hashing , 1990, STOC '90.

[19]  Noga Alon,et al.  The Probabilistic Method , 2015, Fundamentals of Ramsey Theory.

[20]  Kim G. Larsen,et al.  Timed Modal Specification - Theory and Tools , 1993, CAV.

[21]  William G. Griswold,et al.  The design and implementation of dynamic hashing for sets and tables in icon , 1993, Softw. Pract. Exp..

[22]  Y. MANOLOPOULOS,et al.  Performance of Linear Hashing Schemes for Primary Key Retrieval , 1994, Inf. Syst..

[23]  Rajeev Raman,et al.  Sorting in linear time? , 1995, STOC '95.

[24]  Vladimiro Sassone,et al.  Transition systems with independence and multi-arcs , 1997, Partial Order Methods in Verification.

[25]  Martti Penttonen,et al.  A Reliable Randomized Algorithm for the Closest-Pair Problem , 1997, J. Algorithms.

[26]  Peter Bro Miltersen,et al.  Trans-Dichotomous Algorithms Without Multiplication - Some Upper and Lower Bounds , 1997, WADS.

[27]  P. S. Thiagarajan,et al.  A Product Version of Dynamic Linear Time Temporal Logic , 1997, CONCUR.

[28]  Kim Guldstrand Larsen,et al.  Compositional Safety Logics , 1997 .

[29]  P. S. Thiagarajan,et al.  Dynamic Linear Time Temporal Logic , 1997, Ann. Pure Appl. Log..