On the Structure, Covering, and Learning of Poisson Multinomial Distributions

An (n, k)-Poisson Multinomial Distribution (PMD) is the distribution of the sum of n independent random vectors supported on the set Bk={e1,...,ek} of standard basis vectors in Rk. We prove a structural characterization of these distributions, showing that, for all ε > 0, any (n, k)-Poisson multinomial random vector is ε-close, in total variation distance, to the sum of a discretized multidimensional Gaussian and an independent (poly(k/ε), k)-Poisson multinomial random vector. Our structural characterization extends the multi-dimensional CLT of Valiant and Valiant, by simultaneously applying to all approximation requirements ε. In particular, it overcomes factors depending on log n and, importantly, the minimum Eigen value of the PMD's covariance matrix. We use our structural characterization to obtain an ε-cover, in total variation distance, of the set of all (n, k)-PMDs, significantly improving the cover size of Daskalakis and Papadimitriou, and obtaining the same qualitative dependence of the cover size on n and ε as the k=2 cover of Daskalakis and Papadimitriou. We further exploit this structure to show that (n, k)-PMDs can be learned to within ε in total variation distance from Õk(1/ε) samples, which is near-optimal in terms of dependence on ε and independent of n. In particular, our result generalizes the single-dimensional result of Daskalakis, Diakonikolas and Servedio for Poisson binomials to arbitrary dimension. Finally, as a corollary of our results on PMDs, we give a Õk(1/ε2) sample algorithm for learning (n, k)-sums of independent integer random variables (SIIRVs), which is near-optimal for constant k.

[1]  Dietmar Pfeifer,et al.  Poisson approximations of multinomial distributions and point processes , 1988 .

[2]  J. Kiefer,et al.  Asymptotic Minimax Character of the Sample Distribution Function and of the Classical Multinomial Estimator , 1956 .

[3]  Christos H. Papadimitriou,et al.  Computing Equilibria in Anonymous Games , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[4]  Christos H. Papadimitriou,et al.  Discretized Multinomial Distributions and Nash Equilibria in Anonymous Games , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[5]  Rocco A. Servedio,et al.  Learning Poisson Binomial Distributions , 2011, STOC '12.

[6]  A. Barbour,et al.  Poisson Approximation , 1992 .

[7]  B. Roos Multinomial and Krawtchouk Approximations to the Generalized Multinomial Distribution , 2002 .

[8]  Alon Orlitsky,et al.  Sorting with adversarial comparators and application to density estimation , 2014, 2014 IEEE International Symposium on Information Theory.

[9]  V. Bentkus A Lyapunov-type Bound in Rd , 2005 .

[10]  Ryan O'Donnell,et al.  Learning Sums of Independent Integer Random Variables , 2013, 2013 IEEE 54th Annual Symposium on Foundations of Computer Science.

[11]  A. C. Berry The accuracy of the Gaussian approximation to the sum of independent variates , 1941 .

[12]  Constantinos Daskalakis,et al.  Testing Poisson Binomial Distributions , 2014, SODA.

[13]  Wei-Liem Loh Stein's Method and Multinomial Approximation , 1992 .

[14]  Luc Devroye,et al.  Combinatorial methods in density estimation , 2001, Springer series in statistics.

[15]  Christos H. Papadimitriou,et al.  Sparse covers for sums of indicators , 2013, ArXiv.

[16]  I. Shevtsova An improvement of convergence rate estimates in the Lyapunov theorem , 2010 .

[17]  Gregory Valiant,et al.  Estimating the unseen: an n/log(n)-sample estimator for entropy and support size, shown optimal via new CLTs , 2011, STOC '11.

[18]  Constantinos Daskalakis,et al.  Faster and Sample Near-Optimal Algorithms for Proper Learning Mixtures of Gaussians , 2013, COLT.

[19]  Daniel M. Kane,et al.  Nearly Optimal Learning and Sparse Covers for Sums of Independent Integer Random Variables , 2015, ArXiv.

[20]  P. Massart The Tight Constant in the Dvoretzky-Kiefer-Wolfowitz Inequality , 1990 .

[21]  Bero Roos,et al.  Multinomial and Krawtchouk Approximations to the Generalized Multinomial Distribution@@@Multinomial and Krawtchouk Approximations to the Generalized Multinomial Distribution , 2001 .

[22]  Christos H. Papadimitriou,et al.  On oblivious PTAS's for nash equilibrium , 2009, STOC '09.

[23]  Christos H. Papadimitriou,et al.  Approximate Nash equilibria in anonymous games , 2015, J. Econ. Theory.

[24]  A. Barbour Stein's method and poisson process convergence , 1988, Journal of Applied Probability.

[25]  James L. Johnson Probability and Statistics for Computer Science , 2003 .

[26]  Gregory Valiant,et al.  A CLT and tight lower bounds for estimating entropy , 2010, Electron. Colloquium Comput. Complex..