Weakly learning DNF and characterizing statistical query learning using Fourier analysis

We present new results, both positive and negative, on the well-studied problem of learning disjunctive normal form (DNF) expressions. We first prove that an algorithm due to Kushilevitz and Mansour [16] can be used to weakly learn DNF using membership queries in polynomial time, with respect to the uniform distribution on the inputs. This is the first positive result for learning unrestricted DNF expressions in polynomial time in any nontrivial formal model of learning. It provides a sharp contrast with the results of Kharitonov [15], who proved that ACO is not efficiently learnable in the same model (given certain plausible cryptographic assumptions). We also present efficient learning algorithms in various models for the read-k and SAT-k subclasses of DNF. For our negative results, we turn our attention to the recently introduced statistical query model of learning [11]. This model is a restricted version of the popular Probably Approximately Correct (PAC) model [23], and practically every class known to be efficiently learnable in the PAC model is in fact learnable in the statistical query model [11]. Here we give a general characterization of the complexity of statistical query learning in terms of the number of uncorrelated functions in the concept class. This is a distributiondependent quantity yielding upper and lower bounds on the number of st atistical queries required for learning on any input distribution. As a corollary, we obtain that DNF expressions and decision trees are not even weakly learnable with ●This research M sponsored in part by the Wr]ght Laboratory, Aeronautical Systems Center, Air Force Materiel Command, USAF, and the Advanced Research Projects Agency (ARPA) under grant number F33615-93-1-1330 Support also M sponsored by the National Sc]ence Foundation under Grant No CC-91 19319. Blum also supported m part by NSF National Young Investigator grant CCR9357793 Views and conclusions contained in this document are those of the authors and should not be interpreted as necessarily representing official po!lcles or endorsements, either expressed or implied, of Wright Laboratory or the United States Government, or NSF tcontact ~“thor Address: AT&T Bell Laboratcmes, Room 2A423, 600 Mountain Avenue, P.O. Box 636, Murray Hill, NJ 07974 Electronic mail. mkearns@research .at t corn ~Thi~ research ~a~ ~“pported in p~~t by The Israel science Foun. datlon administered by The Israel Academy of Sc]ence and Humanities and by a grant of the Israeli Ministry of Science and Technology Permission to co y without fee all or part of this material is granted provided%atthe copies are not madeordistrftrutectfor direct commercial advantage, the ACM copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Association of Computing Machinery. To copy otherwise, or to republish, requires a fee ancf/or specific permission. STOC 945/94 Montreal, Quebec, Canada Q 1994 ACM 0-89791 -663-8/94/0005..$3.50 respect to the uniform input distribution in polynomial time in the statistical query model. This result is informationtheoretic and therefore does not rely on any unproven assumptions. It demonstrates that no simple modification of the existing algorithms in the computaticmal learning theory literature for learning various restricted forms of DNF and decision trees from passive random examples (and also several algorithms proposed in the experimental machine learning communities, such as the ID3 algorithm for decision trees [22] and its variants) will solve the general problem. The unifying tool for all of our results is the Fourier analysis of a finite class of boolean functions 011 the hypercube.

[1]  Jehoshua Bruck,et al.  Harmonic Analysis of Polynomial Threshold Functions , 1990, SIAM J. Discret. Math..

[2]  Leslie G. Valiant,et al.  Computational limitations on learning from examples , 1988, JACM.

[3]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, CACM.

[4]  R. Schapire Toward Eecient Agnostic Learning , 1992 .

[5]  Noam Nisan,et al.  Constant depth circuits, Fourier transform, and learnability , 1989, 30th Annual Symposium on Foundations of Computer Science.

[6]  Michael Kearns,et al.  Efficient noise-tolerant learning from statistical queries , 1993, STOC.

[7]  R. Schapire,et al.  Toward Efficient Agnostic Learning , 1994 .

[8]  Richard J. Lipton,et al.  Proceedings of the tenth annual ACM symposium on Theory of computing , 1978 .

[9]  Nader H. Bshouty,et al.  Exact learning via the Monotone theory , 1993, Proceedings of 1993 IEEE 34th Annual Foundations of Computer Science.

[10]  Leonard Pitt,et al.  Exact learning of read-k disjoint DNF and not-so-disjoint DNF , 1992, COLT '92.

[11]  Yishay Mansour,et al.  An O(nlog log n) learning algorithm for DNF under the uniform distribution , 1992, COLT '92.

[12]  Thomas R. Hancock,et al.  Learning 2µ DNF Formulas and kµ Decision Trees , 1991, COLT.

[13]  Eyal Kushilevitz,et al.  Learning decision trees using the Fourier spectrum , 1991, STOC '91.

[14]  Yishay Mansour,et al.  An O(n^(log log n)) Learning Algorithm for DNT under the Uniform Distribution , 1995, J. Comput. Syst. Sci..

[15]  Noam Nisan,et al.  Constant depth circuits, Fourier transform, and learnability , 1993, JACM.

[16]  Michael Kharitonov,et al.  Cryptographic hardness of distribution-specific learning , 1993, STOC.

[17]  David Haussler,et al.  Learnability and the Vapnik-Chervonenkis dimension , 1989, JACM.

[18]  Thomas R. Hancock,et al.  Learning 2u DNF formulas and ku decision trees , 1991, COLT 1991.

[19]  Avrim Blum,et al.  Fast learning of k-term DNF formulas with queries , 1992, STOC '92.

[20]  Leslie G. Valiant,et al.  On the learnability of Boolean formulae , 1987, STOC.

[21]  Roni Khardon On Using the Fourier Transform to Learn Disjoint DNF , 1994, Inf. Process. Lett..

[22]  Lisa Hellerstein,et al.  Read-thrice DNF is hard to learn with membership and equivalence queries , 1992, Proceedings., 33rd Annual Symposium on Foundations of Computer Science.

[23]  Sean W. Smith,et al.  Improved learning of AC0 functions , 1991, COLT '91.

[24]  Eyal Kushilevitz,et al.  On learning visual concepts and DNF formulae , 1993, COLT '93.

[25]  H. Aizenstein,et al.  Exact learning of read-twice DNF formulas , 1991, [1991] Proceedings 32nd Annual Symposium of Foundations of Computer Science.