论文信息 - Square Hellinger Subadditivity for Bayesian Networks and its Applications to Identity Testing

Square Hellinger Subadditivity for Bayesian Networks and its Applications to Identity Testing

We show that the square Hellinger distance between two Bayesian networks on the same directed graph, $G$, is subadditive with respect to the neighborhoods of $G$. Namely, if $P$ and $Q$ are the probability distributions defined by two Bayesian networks on the same DAG, our inequality states that the square Hellinger distance, $H^2(P,Q)$, between $P$ and $Q$ is upper bounded by the sum, $\sum_v H^2(P_{\{v\} \cup \Pi_v}, Q_{\{v\} \cup \Pi_v})$, of the square Hellinger distances between the marginals of $P$ and $Q$ on every node $v$ and its parents $\Pi_v$ in the DAG. Importantly, our bound does not involve the conditionals but the marginals of $P$ and $Q$. We derive a similar inequality for more general Markov Random Fields. As an application of our inequality, we show that distinguishing whether two Bayesian networks $P$ and $Q$ on the same (but potentially unknown) DAG satisfy $P=Q$ vs $d_{\rm TV}(P,Q)>\epsilon$ can be performed from $\tilde{O}(|\Sigma|^{3/4(d+1)} \cdot n/\epsilon^2)$ samples, where $d$ is the maximum in-degree of the DAG and $\Sigma$ the domain of each variable of the Bayesian networks. If $P$ and $Q$ are defined on potentially different and potentially unknown trees, the sample complexity becomes $\tilde{O}(|\Sigma|^{4.5} n/\epsilon^2)$, whose dependence on $n, \epsilon$ is optimal up to logarithmic factors. Lastly, if $P$ and $Q$ are product distributions over $\{0,1\}^n$ and $Q$ is known, the sample complexity becomes $O(\sqrt{n}/\epsilon^2)$, which is optimal up to constant factors.

Constantinos Daskalakis | Qinxuan Pan | C. Daskalakis | Qinxuan Pan

[1] Ronitt Rubinfeld,et al. Testing Shape Restrictions of Discrete Distributions , 2015, Theory of Computing Systems.

[2] Daniel M. Kane,et al. A New Approach for Testing Properties of Discrete Distributions , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[3] Alon Orlitsky,et al. Sorting with adversarial comparators and application to density estimation , 2014, 2014 IEEE International Symposium on Information Theory.

[4] C. N. Liu,et al. Approximating discrete probability distributions with dependence trees , 1968, IEEE Trans. Inf. Theory.

[5] Liam Paninski,et al. A Coincidence-Based Test for Uniformity Given Very Sparsely Sampled Discrete Data , 2008, IEEE Transactions on Information Theory.

[6] Luc Devroye,et al. Combinatorial methods in density estimation , 2001, Springer series in statistics.

[7] Ronitt Rubinfeld,et al. Testing monotonicity of distributions over general partial orders , 2011, ICS.

[8] Guy Bresler,et al. Efficiently Learning Ising Models on Arbitrary Graphs , 2014, STOC.

[9] Ronitt Rubinfeld,et al. Sublinear algorithms for testing monotone and unimodal distributions , 2004, STOC '04.

[10] Vincent Y. F. Tan,et al. High-dimensional structure estimation in Ising models: Local separation criterion , 2011, 1107.1736.

[11] Constantinos Daskalakis,et al. Optimal Testing for Properties of Distributions , 2015, NIPS.

[12] Constantinos Daskalakis,et al. Faster and Sample Near-Optimal Algorithms for Proper Learning Mixtures of Gaussians , 2013, COLT.

[13] Ronitt Rubinfeld,et al. Testing random variables for independence and identity , 2001, Proceedings 2001 IEEE International Conference on Cluster Computing.

[14] Ronitt Rubinfeld,et al. Testing Non-uniform k-Wise Independent Distributions over Product Spaces , 2010, ICALP.

[15] Gregory Valiant,et al. An Automatic Inequality Prover and Instance Optimal Identity Testing , 2014, 2014 IEEE 55th Annual Symposium on Foundations of Computer Science.

[16] Noga Alon,et al. Testing k-wise and almost k-wise independence , 2007, STOC '07.