Low Discrepancy Sets Yield Approximate Min-Wise Independent Permutation Families

Motivated by a problem of filtering near-duplicate Web documents, Broder, Charikar, Frieze and Mitzenmacher defined the following notion of ϵ-approximate min-wise independent permutation families. A multiset F of permutations of {0,1,…,n−1} is such a family if for all K⫅{0,1,…,n−1} and any x∈K, a permutation π chosen uniformly at random from F satisfies |Pr[min{π(K)}=π(x)]−1|K||≤ϵ|K|. We show connections of such families with low discrepancy sets for geometric rectangles, and give explicit constructions of such families F of size nO(logn) for ϵ=1/nΘ(1), improving upon the previously best-known bound of Indyk. We also present polynomial-size constructions when the min-wise condition is required only for |K|≤2O(log2/3n), with ϵ≥2−O(log2/3n).

[1]  A. Joffe On a Set of Almost Deterministic $k$-Independent Random Variables , 1974 .

[2]  Michael E. Saks,et al.  Efficient construction of a small hitting set for combinatorial rectangles in high dimension , 1993, Comb..

[3]  Piotr Indyk,et al.  A small approximately min-wise independent family of hash functions , 1999, SODA '99.

[4]  Chi-Jen Lu,et al.  Improved Pseudorandom Generators for Combinatorial Rectangles , 1998, Comb..

[5]  Noam Nisan,et al.  Velickovic approximations of general independent distributions , 1992, Symposium on the Theory of Computing.

[6]  Alan M. Frieze,et al.  Min-Wise Independent Permutations , 2000, J. Comput. Syst. Sci..

[7]  Richard M. Karp,et al.  A fast parallel algorithm for the maximal independent set problem , 1984, STOC '84.

[8]  E. Rees Notes on Geometry , 1983 .

[9]  Michael Luby,et al.  A simple parallel algorithm for the maximal independent set problem , 1985, STOC '85.

[10]  Michael E. Saks,et al.  Discrepancy sets and pseudorandom generators for combinatorial rectangles , 1996, Proceedings of 37th Conference on Foundations of Computer Science.

[11]  Noam Nisan,et al.  Approximations of general independent distributions , 1992, STOC '92.

[12]  Michael E. Saks,et al.  Efficient construction of a small hitting set for combinatorial rectangles in high dimension , 1997, Comb..

[13]  P. Cameron FINITE PERMUTATION GROUPS AND FINITE SIMPLE GROUPS , 1981 .

[14]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[15]  Noga Alon,et al.  A Fast and Simple Randomized Parallel Algorithm for the Maximal Independent Set Problem , 1985, J. Algorithms.

[16]  Noam Nisan,et al.  Pseudorandomness for network algorithms , 1994, STOC '94.