Efficient Gauss Elimination for Near-Quadratic Matrices with One Short Random Block per Row, with Applications

In this paper we identify a new class of sparse near-quadratic random Boolean matrices that have full row rank over $\mathbb{F}_2=\{0,1\}$ with high probability and can be transformed into echelon form in almost linear time by a simple version of Gauss elimination. The random matrix with dimensions $n(1-\varepsilon) \times n$ is generated as follows: In each row, identify a block of length $L=O((\log n)/\varepsilon)$ at a random position. The entries outside the block are 0, the entries inside the block are given by fair coin tosses. Sorting the rows according to the positions of the blocks transforms the matrix into a kind of band matrix, on which, as it turns out, Gauss elimination works very efficiently with high probability. For the proof, the effects of Gauss elimination are interpreted as a ("coin-flipping") variant of Robin Hood hashing, whose behaviour can be captured in terms of a simple Markov model from queuing theory. Bounds for expected construction time and high success probability follow from results in this area. By employing hashing, this matrix family leads to a new implementation of a retrieval data structure, which represents an arbitrary function $f\colon S \to \{0,1\}$ for some set $S$ of $m=(1-\varepsilon)n$ keys. It requires $m/(1-\varepsilon)$ bits of space, construction takes $O(m/\varepsilon^2$) expected time on a word RAM, while queries take $O(1/\varepsilon)$ time and access only one contiguous segment of $O((\log m)/\varepsilon)$ bits in the representation. The method is competitive with state-of-the-art methods. By well-established methods the retrieval data structure leads to efficient constructions of (static) perfect hash functions and (static) Bloom filters with almost optimal space and very local storage access patterns for queries.

[1]  Onno Boxma,et al.  SOJOURN TIME TAILS IN THE M/D/1 PROCESSOR SHARING QUEUE , 2006, Probability in the Engineering and Informational Sciences.

[2]  Wayne Eberly On efficient band matrix arithmetic , 1992, Proceedings., 33rd Annual Symposium on Foundations of Computer Science.

[3]  Michael Molloy,et al.  Cores in random hypergraphs and Boolean formulas , 2005, Random Struct. Algorithms.

[4]  Daniel A. Spielman,et al.  Practical loss-resilient codes , 1997, STOC '97.

[5]  Harald Niederreiter,et al.  Probability and computing: randomized algorithms and probabilistic analysis , 2006, Math. Comput..

[6]  Xin-She Yang,et al.  Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.

[7]  Svante Janson Individual Displacements in Hashing with Coalesced Chains , 2008, Comb. Probab. Comput..

[8]  Giuseppe Ottaviano,et al.  Fast Scalable Construction of (Minimal Perfect Hash) Functions , 2016, SEA.

[9]  Robert B. Cooper,et al.  An Introduction To Queueing Theory , 2016 .

[10]  Boris G. Pittel,et al.  The Satisfiability Threshold for k-XORSAT , 2012, Combinatorics, Probability and Computing.

[11]  Sebastiano Vigna,et al.  BUbiNG: massive crawling for the masses , 2014, WWW.

[12]  Rasmus Pagh,et al.  Simple and Space-Efficient Minimal Perfect Hash Functions , 2007, WADS.

[13]  Pat Morin,et al.  On Worst-Case Robin Hood Hashing , 2004, SIAM J. Comput..

[14]  Victor Y. Pan,et al.  On Parallel Computations with Banded Matrices , 1995, Inf. Comput..

[15]  Bernard Chazelle,et al.  The Bloomier filter: an efficient data structure for static support lookup tables , 2004, SODA '04.

[16]  Martin Dietzfelbinger,et al.  Applications of a Splitting Trick , 2009, ICALP.

[17]  Martin Dietzfelbinger,et al.  Succinct Data Structures for Retrieval and Approximate Membership , 2008, ICALP.

[18]  George Havas,et al.  Graphs, Hypergraphs and Hashing , 1993, WG.

[19]  Martin Dietzfelbinger,et al.  Balanced allocation and dictionaries with tightly packed constant size bins , 2005, Theor. Comput. Sci..

[20]  Daniel A. Spielman,et al.  Efficient erasure correcting codes , 2001, IEEE Trans. Inf. Theory.

[21]  Stefan Walzer,et al.  Dense Peelable Random Uniform Hypergraphs , 2019, ESA.

[22]  Svante Janson,et al.  2 SVANTE JANSON AND ALFREDO VIOLA , 2014 .

[23]  Stefan Walzer,et al.  Constant-Time Retrieval with O(log m) Extra Bits , 2019, STACS.

[24]  J. Ian Munro,et al.  Robin hood hashing , 1985, 26th Annual Symposium on Foundations of Computer Science (sfcs 1985).

[25]  Rasmus Pagh,et al.  Practical perfect hashing in nearly optimal space , 2013, Inf. Syst..

[26]  Gene H. Golub,et al.  Matrix computations , 1983 .

[27]  Giuseppe Ottaviano,et al.  Cache-Oblivious Peeling of Random Hypergraphs , 2013, 2014 Data Compression Conference.

[28]  Yoshiharu Kohayakawa,et al.  A Practical Minimal Perfect Hashing Method , 2005, WEA.

[29]  D. Kendall Stochastic Processes Occurring in the Theory of Queues and their Analysis by the Method of the Imbedded Markov Chain , 1953 .

[30]  Andrei Broder,et al.  Network Applications of Bloom Filters: A Survey , 2004, Internet Math..

[31]  Svante Janson,et al.  Individual displacements for linear probing hashing with different insertion policies , 2005, TALG.

[32]  Ely Porat,et al.  An Optimal Bloom Filter Replacement Based on Matrix Solving , 2008, CSR.

[33]  Colin Cooper,et al.  On the rank of random matrices , 2000, Random Struct. Algorithms.

[34]  Douglas H. Wiedemann Solving sparse linear equations over finite fields , 1986, IEEE Trans. Inf. Theory.

[35]  Alfredo Viola,et al.  Exact distribution of individual displacements in linear probing hashing , 2005, TALG.