Square Hellinger Subadditivity for Bayesian Networks and its Applications to Identity Testing

We show that the square Hellinger distance between two Bayesian networks on the same directed graph, $G$, is subadditive with respect to the neighborhoods of $G$. Namely, if $P$ and $Q$ are the probability distributions defined by two Bayesian networks on the same DAG, our inequality states that the square Hellinger distance, $H^2(P,Q)$, between $P$ and $Q$ is upper bounded by the sum, $\sum_v H^2(P_{\{v\} \cup \Pi_v}, Q_{\{v\} \cup \Pi_v})$, of the square Hellinger distances between the marginals of $P$ and $Q$ on every node $v$ and its parents $\Pi_v$ in the DAG. Importantly, our bound does not involve the conditionals but the marginals of $P$ and $Q$. We derive a similar inequality for more general Markov Random Fields. As an application of our inequality, we show that distinguishing whether two Bayesian networks $P$ and $Q$ on the same (but potentially unknown) DAG satisfy $P=Q$ vs $d_{\rm TV}(P,Q)>\epsilon$ can be performed from $\tilde{O}(|\Sigma|^{3/4(d+1)} \cdot n/\epsilon^2)$ samples, where $d$ is the maximum in-degree of the DAG and $\Sigma$ the domain of each variable of the Bayesian networks. If $P$ and $Q$ are defined on potentially different and potentially unknown trees, the sample complexity becomes $\tilde{O}(|\Sigma|^{4.5} n/\epsilon^2)$, whose dependence on $n, \epsilon$ is optimal up to logarithmic factors. Lastly, if $P$ and $Q$ are product distributions over $\{0,1\}^n$ and $Q$ is known, the sample complexity becomes $O(\sqrt{n}/\epsilon^2)$, which is optimal up to constant factors.

[1]  Ronitt Rubinfeld,et al.  Testing Shape Restrictions of Discrete Distributions , 2015, Theory of Computing Systems.

[2]  Daniel M. Kane,et al.  A New Approach for Testing Properties of Discrete Distributions , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[3]  Alon Orlitsky,et al.  Sorting with adversarial comparators and application to density estimation , 2014, 2014 IEEE International Symposium on Information Theory.

[4]  C. N. Liu,et al.  Approximating discrete probability distributions with dependence trees , 1968, IEEE Trans. Inf. Theory.

[5]  Liam Paninski,et al.  A Coincidence-Based Test for Uniformity Given Very Sparsely Sampled Discrete Data , 2008, IEEE Transactions on Information Theory.

[6]  Luc Devroye,et al.  Combinatorial methods in density estimation , 2001, Springer series in statistics.

[7]  Ronitt Rubinfeld,et al.  Testing monotonicity of distributions over general partial orders , 2011, ICS.

[8]  Guy Bresler,et al.  Efficiently Learning Ising Models on Arbitrary Graphs , 2014, STOC.

[9]  Ronitt Rubinfeld,et al.  Sublinear algorithms for testing monotone and unimodal distributions , 2004, STOC '04.

[10]  Vincent Y. F. Tan,et al.  High-dimensional structure estimation in Ising models: Local separation criterion , 2011, 1107.1736.

[11]  Constantinos Daskalakis,et al.  Optimal Testing for Properties of Distributions , 2015, NIPS.

[12]  Constantinos Daskalakis,et al.  Faster and Sample Near-Optimal Algorithms for Proper Learning Mixtures of Gaussians , 2013, COLT.

[13]  Ronitt Rubinfeld,et al.  Testing random variables for independence and identity , 2001, Proceedings 2001 IEEE International Conference on Cluster Computing.

[14]  Ronitt Rubinfeld,et al.  Testing Non-uniform k-Wise Independent Distributions over Product Spaces , 2010, ICALP.

[15]  Gregory Valiant,et al.  An Automatic Inequality Prover and Instance Optimal Identity Testing , 2014, 2014 IEEE 55th Annual Symposium on Foundations of Computer Science.

[16]  Noga Alon,et al.  Testing k-wise and almost k-wise independence , 2007, STOC '07.