Learning transformed product distributions

We consider the problem of learning an unknown product distribution $X$ over $\{0,1\}^n$ using samples $f(X)$ where $f$ is a \emph{known} transformation function. Each choice of a transformation function $f$ specifies a learning problem in this framework. Information-theoretic arguments show that for every transformation function $f$ the corresponding learning problem can be solved to accuracy $\eps$, using $\tilde{O}(n/\eps^2)$ examples, by a generic algorithm whose running time may be exponential in $n.$ We show that this learning problem can be computationally intractable even for constant $\eps$ and rather simple transformation functions. Moreover, the above sample complexity bound is nearly optimal for the general problem, as we give a simple explicit linear transformation function $f(x)=w \cdot x$ with integer weights $w_i \leq n$ and prove that the corresponding learning problem requires $\Omega(n)$ samples. As our main positive result we give a highly efficient algorithm for learning a sum of independent unknown Bernoulli random variables, corresponding to the transformation function $f(x)= \sum_{i=1}^n x_i$. Our algorithm learns to $\eps$-accuracy in poly$(n)$ time, using a surprising poly$(1/\eps)$ number of samples that is independent of $n.$ We also give an efficient algorithm that uses $\log n \cdot \poly(1/\eps)$ samples but has running time that is only $\poly(\log n, 1/\eps).$

[1]  J. Feldman,et al.  Learning mixtures of product distributions over discrete domains , 2005, 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS'05).

[2]  Yishay Mansour,et al.  Estimating a mixture of two product distributions , 1999, COLT '99.

[3]  J. Keilson,et al.  Some Results for Discrete Unimodality , 1971 .

[4]  L. Birge,et al.  Estimation of unimodal densities without smoothness assumptions , 1997 .

[5]  M. Naor Evaluation May Be Easier than Generation , 1996 .

[6]  Sampath Kannan,et al.  Efficient algorithms for inverting evolution , 1999, JACM.

[7]  Andris Ambainis,et al.  Nearly tight bounds on the learnability of evolution , 1997, Proceedings 38th Annual Symposium on Foundations of Computer Science.

[8]  Luc Devroye,et al.  Combinatorial methods in density estimation , 2001, Springer series in statistics.

[9]  Ronitt Rubinfeld,et al.  On the learnability of discrete distributions , 1994, STOC '94.

[10]  P. Massart The Tight Constant in the Dvoretzky-Kiefer-Wolfowitz Inequality , 1990 .

[11]  Christos H. Papadimitriou,et al.  On oblivious PTAS's for nash equilibrium , 2009, STOC '09.

[12]  Adrian Röllin Translated Poisson approximation using exchangeable pair couplings. , 2007 .

[13]  G. Lugosi,et al.  A universally acceptable smoothing factor for kernel density estimates , 1996 .

[14]  Yuval Peres The unreasonable effectiveness of martingales , 2009, SODA.

[15]  L. Birge Estimating a Density under Order Restrictions: Nonasymptotic Minimax Risk , 1987 .

[16]  Paul W. Goldberg,et al.  Evolutionary Trees Can be Learned in Polynomial Time in the Two-State General Markov Model , 2001, SIAM J. Comput..

[17]  J. Kiefer,et al.  Asymptotic Minimax Character of the Sample Distribution Function and of the Classical Multinomial Estimator , 1956 .

[18]  Y. Yatracos Rates of Convergence of Minimum Distance Estimators and Kolmogorov's Entropy , 1985 .

[19]  G. Lugosi,et al.  Nonasymptotic universal smoothing factors, kernel complexity and yatracos classes , 1997 .

[20]  Constantinos Daskalakis,et al.  An Efficient PTAS for Two-Strategy Anonymous Games , 2008, WINE.