The Minimization of Random Hypergraphs

We investigate the maximum-entropy model $\mathcal{B}_{n,m,p}$ for random $n$-vertex, $m$-edge multi-hypergraphs with expected edge size $pn$. We show that the expected size of the minimization $\min(\mathcal{B}_{n,m,p})$, i.e., the number of inclusion-wise minimal edges of $\mathcal{B}_{n,m,p}$, undergoes a phase transition with respect to $m$. If $m$ is at most $1/(1-p)^{(1-p)n}$, then $\mathrm{E}[|\min(\mathcal{B}_{n,m,p})|]$ is of order $\Theta(m)$, while for $m \ge 1/(1-p)^{(1-p+\varepsilon)n}$ for any $\varepsilon > 0$, it is $\Theta( 2^{(\mathrm{H}(\alpha) + (1-\alpha) \log_2 p) n}/ \sqrt{n})$. Here, $\mathrm{H}$ denotes the binary entropy function and $\alpha = - (\log_{1-p} m)/n$. The result implies that the maximum expected number of minimal edges over all $m$ is $\Theta((1+p)^n/\sqrt{n})$. Our structural findings have algorithmic implications for minimizing an input hypergraph, which has applications in the profiling of relational databases as well as for the Orthogonal Vectors problem studied in fine-grained complexity. We make several technical contributions that are of independent interest in probability. First, we improve the Chernoff--Hoeffding theorem on the tail of the binomial distribution. In detail, we show that for a binomial variable $Y \sim \operatorname{Bin}(n,p)$ and any $0 < x < p$, it holds that $\mathrm{P}[Y \le xn] = \Theta( 2^{-\!\mathrm{D}(x \,{\|}\, p) n}/\sqrt{n})$, where $\mathrm{D}$ is the binary Kullback--Leibler divergence between Bernoulli distributions. We give explicit upper and lower bounds on the constants hidden in the big-O notation that hold for all $n$. Secondly, we establish the fact that the probability of a set of cardinality $i$ being minimal after $m$ i.i.d. maximum-entropy trials exhibits a sharp threshold behavior at $i^* = n + \log_{1-p} m$.

[1]  А Н Колмогоров,et al.  Успехи математических наук. , 1948 .

[2]  E. Jaynes Information Theory and Statistical Mechanics , 1957 .

[3]  L. Goddard Information Theory , 1962, Nature.

[4]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[5]  E. Slud Distribution Inequalities for the Binomial Law , 1977 .

[6]  Béla Bollobás,et al.  A Probabilistic Proof of an Asymptotic Formula for the Number of Labelled Regular Graphs , 1980, Eur. J. Comb..

[7]  James Allen Fill Convergence Rates Related to the Strong Law of Large Numbers. , 1983 .

[8]  V. Balasubramanian,et al.  Maximum entropy principle , 1984, J. Am. Soc. Inf. Sci..

[9]  Béla Bollobás,et al.  Random Graphs , 1985 .

[10]  Jeanette P. Schmidt,et al.  Component structure in the evolution of random hypergraphs , 1985, Comb..

[11]  T. A. Azlarov,et al.  Refinements of Yu. V. Prokhorov's theorems on the asymptotic behavior of the binomial distribution , 1987 .

[12]  조위덕 Cryptography , 1987, The Official (ISC)2 SSCP CBK Reference.

[13]  Claude Berge,et al.  Hypergraphs - combinatorics of finite sets , 1989, North-Holland mathematical library.

[14]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[15]  Harald Cram'er,et al.  Sur un nouveau théorème-limite de la théorie des probabilités , 2018 .

[16]  Umesh V. Vazirani,et al.  An Introduction to Computational Learning Theory , 1994 .

[17]  Rajeev Motwani,et al.  Randomized Algorithms , 1995, SIGA.

[18]  Amir Dembo,et al.  Large Deviations Techniques and Applications , 1998 .

[19]  Bernhard Thalheim,et al.  Asymptotic Properties of Keys and Functional Dependencies in Random Databases , 1998, Theor. Comput. Sci..

[20]  E. Lieb,et al.  The physics and mathematics of the second law of thermodynamics (Physics Reports 310 (1999) 1–96)☆ , 1997, cond-mat/9708200.

[21]  Bernhard Klar,et al.  BOUNDS ON TAIL PROBABILITIES OF DISCRETE DISTRIBUTIONS , 2000, Probability in the Engineering and Informational Sciences.

[22]  Peter Harremoës,et al.  Binomial and Poisson distributions as maximum entropy distributions , 2001, IEEE Trans. Inf. Theory.

[23]  M E Newman,et al.  Scientific collaboration networks. I. Network construction and fundamental results. , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[24]  Tomasz Łuczak,et al.  The phase transition in a random hypergraph , 2002 .

[25]  M. Newman,et al.  Statistical mechanics of networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[26]  Aiden A. Bruen,et al.  Cryptography, information theory, and error-correction - a handbook for the 21st century , 2005, Wiley-Interscience series in discrete mathematics and optimization.

[27]  Ryan Williams,et al.  A new algorithm for optimal 2-constraint satisfaction and its implications , 2005, Theor. Comput. Sci..

[28]  Yoshua Bengio,et al.  Entropy Regularization , 2006, Semi-Supervised Learning.

[29]  Alan M. Frieze,et al.  Random graphs , 2006, SODA '06.

[30]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[31]  G. Bianconi The entropy of randomized network ensembles , 2007, 0708.0153.

[32]  D. Garlaschelli,et al.  Maximum likelihood: extracting unbiased information from complex networks. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[33]  Alessandro Panconesi,et al.  Concentration of Measure for the Analysis of Randomized Algorithms , 2009 .

[34]  Ginestra Bianconi,et al.  Entropy measures for networks: toward an information theory of complex topologies. , 2009, Physical review. E, Statistical, nonlinear, and soft matter physics.

[35]  H. K. Kesavan,et al.  Jaynes' Maximum Entropy Principle , 2009, Encyclopedia of Optimization.

[36]  Isaac L. Chuang,et al.  Quantum Computation and Quantum Information (10th Anniversary edition) , 2011 .

[37]  S. Varadhan,et al.  Large deviations , 2019, Graduate Studies in Mathematics.

[38]  Amin Coja-Oghlan,et al.  The order of the giant component of random hypergraphs , 2007, Random Struct. Algorithms.

[39]  Mark Newman,et al.  Networks: An Introduction , 2010 .

[40]  Leopoldo E. Bertossi,et al.  Database Repairing and Consistent Query Answering , 2011, Database Repairing and Consistent Query Answering.

[41]  Gyula O. H. Katona Random Databases with Correlated Data , 2012, Conceptual Modelling and Its Theoretical Foundations.

[42]  Gyula O. H. Katona Testing Functional Connection between Two Random Variables , 2013 .

[43]  Ryan O'Donnell,et al.  Analysis of Boolean Functions , 2014, ArXiv.

[44]  Amin Coja-Oghlan,et al.  Local Limit Theorems for the Giant Component of Random Hypergraphs† , 2014, Combinatorics, Probability and Computing.

[45]  Per Kristian Lehre,et al.  Black-box Complexity of Parallel Search with Distributed Populations , 2015, FOGA.

[46]  Felix Naumann,et al.  Profiling relational data: a survey , 2015, The VLDB Journal.

[47]  Pietro Simone Oliveto,et al.  Improved time complexity analysis of the Simple Genetic Algorithm , 2015, Theor. Comput. Sci..

[48]  Felix Naumann,et al.  Functional Dependency Discovery: An Experimental Evaluation of Seven Algorithms , 2015, Proc. VLDB Endow..

[49]  Andrea Gabrielli,et al.  Randomizing bipartite networks: the case of the World Trade Web , 2015, Scientific Reports.

[50]  Rolf Niedermeier,et al.  Exploiting hidden structure in selecting dimensions that distinguish vectors , 2015, J. Comput. Syst. Sci..

[51]  Remco van der Hofstad,et al.  Random Graphs and Complex Networks , 2016, Cambridge Series in Statistical and Probabilistic Mathematics.

[52]  Remco van der Hofstad,et al.  Random Graphs and Complex Networks: Volume 1 , 2016 .

[53]  Tobias Friedrich,et al.  The Parameterized Complexity of Dependency Detection in Relational Databases , 2016, IPEC.

[54]  Michel Habib,et al.  Into the Square: On the Complexity of Some Quadratic-time Solvable Problems , 2016, ICTCS.

[55]  Felix Naumann,et al.  Efficient Denial Constraint Discovery with Hydra , 2017, Proc. VLDB Endow..

[56]  Prashant Nalini Vasudevan,et al.  Average-case fine-grained hardness , 2017, Electron. Colloquium Comput. Complex..

[57]  Russell Impagliazzo,et al.  Completeness for First-order Properties on Sparse Structures with Algorithmic Applications , 2017, SODA.

[58]  Felix Naumann,et al.  Data Profiling , 2018, Data Profiling.

[59]  Richard J. Fitzgerald,et al.  Scientific collaboration networks , 2018 .

[60]  Aravind Srinivasan,et al.  Probability and Computing , 2018, SIGA.

[61]  Tobias Friedrich,et al.  Efficiently Enumerating Hitting Sets of Hypergraphs Arising in Data Profiling , 2018, ALENEX.

[62]  Benjamin Doerr,et al.  Probabilistic Tools for the Analysis of Randomized Optimization Heuristics , 2018, Theory of Evolutionary Computation.

[63]  Daniel M. Kane,et al.  The Orthogonal Vectors Conjecture for Branching Programs and Formulas , 2017, ITCS.

[64]  Alan M. Frieze,et al.  On the rank of a random binary matrix , 2018, SODA.

[65]  Ihab F. Ilyas,et al.  Approximate Denial Constraints , 2020, Proc. VLDB Endow..

[66]  Laurent Decreusefond,et al.  Construction and Random Generation of Hypergraphs with Prescribed Degree and Dimension Sequences , 2020, DEXA.

[67]  Virginia Vassilevska Williams,et al.  New Techniques for Proving Fine-Grained Average-Case Hardness , 2020, 2020 IEEE 61st Annual Symposium on Foundations of Computer Science (FOCS).

[68]  Philip S. Chodrow,et al.  Configuration Models of Random Hypergraphs and their Applications , 2019, J. Complex Networks.