Learning decision trees using the Fourier spectrum

This work gives apolynomial time algorithm for learning decision trees with respect to the uniform distribution. (This algorithm uses membership queries.) The decision tree model that is considered is an extension of the traditional boolean decision tree model that allows linear operations in each node (i.e., summation of a subset of the input variables over GF(2)). This paper shows how to learn in polynomial time any function that can be approximated (in norm L2) by a polynomially sparse function (i.e., a function with only polynomially many nonzero Fourier coefficients). The authors demonstrate that any functionf whose L -norm (i.e., the sum of absolute value of the Fourier coefficients) is polynomial can be approximated by a polynomially sparse function, and prove that boolean decision trees with linear operations are a subset of this class of functions. Moreover, it is shown that the functions with polynomial L -norm can be learned deterministically. The algorithm can also exactly identify a decision tree of depth d in time polynomial in 2 a and n. This result implies that trees of logarithmic depth can be identified in polynomial time.

[1]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[2]  Andrew Chi-Chih Yao,et al.  Separating the Polynomial-Time Hierarchy by Oracles (Preliminary Version) , 1985, FOCS.

[3]  Dana Angluin,et al.  Learning Regular Sets from Queries and Counterexamples , 1987, Inf. Comput..

[4]  J. Håstad Computational limitations of small-depth circuits , 1987 .

[5]  David Haussler,et al.  Learning decision trees from random examples , 1988, Annual Conference Computational Learning Theory.

[6]  Noam Nisan,et al.  Constant depth circuits, Fourier transform, and learnability , 1989, 30th Annual Symposium on Foundations of Computer Science.

[7]  Leonid A. Levin,et al.  A hard-core predicate for all one-way functions , 1989, STOC '89.

[8]  Moni Naor,et al.  Small-bias probability spaces: efficient constructions and applications , 1990, STOC '90.

[9]  Torben Hagerup,et al.  A Guided Tour of Chernoff Bounds , 1990, Inf. Process. Lett..

[10]  Alon Orlitsky,et al.  A Spectral Lower Bound Techniqye for the Size of Decision Trees and Two Level AND/OR Circuits , 1990, IEEE Trans. Computers.

[11]  Jehoshua Bruck,et al.  Harmonic Analysis of Polynomial Threshold Functions , 1990, SIAM J. Discret. Math..

[12]  Thomas R. Hancock,et al.  Identifying μ-formula decision trees with queries , 1990, COLT '90.

[13]  M. Bellare THE SPECTRAL NORM OF FINITE FUNCTIONS , 1991 .

[14]  James Aspnes,et al.  The expressive power of voting polynomials , 1991, STOC '91.

[15]  Ron M. Roth,et al.  Interpolation and Approximation of Sparse Multivariate Polynomials over GF(2) , 1991, SIAM J. Comput..

[16]  William Aiello,et al.  Learning the Fourier spectrum of probabilistic lists and trees , 1991, SODA '91.

[17]  Thomas R. Hancock Learning 2µ DNF Formulas and kµ Decision Trees , 1991, COLT.

[18]  Jehoshua Bruck,et al.  On the Power of Threshold Circuits with Small Weights , 1991, SIAM J. Discret. Math..

[19]  Yishay Mansour,et al.  An O(nlog log n) learning algorithm for DNF under the uniform distribution , 1992, COLT '92.

[20]  Mihir Bellare A technique for upper bounding the spectral norm with applications to learning , 1992, COLT '92.

[21]  Jehoshua Bruck,et al.  Polynomial Threshold Functions, AC^0 Functions, and Spectral Norms , 1992, SIAM J. Comput..