Learning and Testing Junta Distributions

We consider the problem of learning distributions in the presence of irrelevant features. This problem is formalized by introducing a new notion of k-junta distributions. Informally, a distribution D over the domain X is a k-junta distribution with respect to another distribution U over the same domain if there is a set J ⊆ [n] of size |J | ≤ k that captures the difference between D and U . We show that it is possible to learn k-junta distributions with respect to the uniform distribution over the Boolean hypercube {0, 1} in time poly(n, 1/ ). This result is obtained via a new Fourier-based learning algorithm inspired by the Low-Degree Algorithm of Linial, Mansour, and Nisan (1993). We also consider the problem of testing whether an unknown distribution is a k-junta distribution with respect to the uniform distribution. We give a nearly-optimal algorithm for this task. Both the analysis of the algorithm and the lower bound showing its optimality are obtained by establishing connections between the problem of testing junta distributions and testing uniformity of weighted collections of distributions.

[1]  Ryan O'Donnell,et al.  Learning functions of k relevant variables , 2004, J. Comput. Syst. Sci..

[2]  Eli Upfal,et al.  Probability and Computing: Randomized Algorithms and Probabilistic Analysis , 2005 .

[3]  Ronitt Rubinfeld,et al.  Testing that distributions are close , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[4]  Rocco A. Servedio,et al.  Agnostically learning halfspaces , 2005, 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS'05).

[5]  Ryan O'Donnell,et al.  Analysis of Boolean Functions , 2014, ArXiv.

[6]  Eldar Fischer,et al.  On the power of conditional samples in distribution testing , 2013, ITCS '13.

[7]  Ronitt Rubinfeld,et al.  Testing Properties of Collections of Distributions , 2013, Theory Comput..

[8]  Rocco A. Servedio,et al.  Learning k-Modal Distributions via Testing , 2012, Theory Comput..

[9]  Ferat Sahin,et al.  A survey on feature selection methods , 2014, Comput. Electr. Eng..

[10]  Rocco A. Servedio,et al.  Testing probability distributions using conditional samples , 2012, Electron. Colloquium Comput. Complex..

[11]  Daniel M. Kane,et al.  A New Approach for Testing Properties of Discrete Distributions , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[12]  Avrim Blum,et al.  Relevant Examples and Relevant Features: Thoughts from Computational Learning Theory , 1994 .

[13]  Gregory Valiant,et al.  Finding Correlations in Subquadratic Time, with Applications to Learning Parities and Juntas , 2012, 2012 IEEE 53rd Annual Symposium on Foundations of Computer Science.

[14]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[15]  Ron Kohavi,et al.  Feature Selection for Knowledge Discovery and Data Mining , 1998 .

[16]  Liam Paninski,et al.  A Coincidence-Based Test for Uniformity Given Very Sparsely Sampled Discrete Data , 2008, IEEE Transactions on Information Theory.

[17]  Noam Nisan,et al.  Constant depth circuits, Fourier transform, and learnability , 1989, 30th Annual Symposium on Foundations of Computer Science.

[18]  Luc Devroye,et al.  Combinatorial methods in density estimation , 2001, Springer series in statistics.

[19]  T. Sanders,et al.  Analysis of Boolean Functions , 2012, ArXiv.

[20]  Ronitt Rubinfeld,et al.  On the learnability of discrete distributions , 1994, STOC '94.

[21]  Ilias Diakonikolas,et al.  Learning Structured Distributions , 2016, Handbook of Big Data.

[22]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..