Learning stochastic decision trees

We give a quasipolynomial-time algorithm for learning stochastic decision trees that is optimally resilient to adversarial noise. Given an η-corrupted set of uniform random samples labeled by a size-s stochastic decision tree, our algorithm runs in time nO(log(s/ε)/ε 2) and returns a hypothesis with error within an additive 2η +ε of the Bayes optimal. An additive 2η is the information-theoretic minimum. Previously no non-trivial algorithm with a guarantee of O(η) + ε was known, even for weaker noise models. Our algorithm is furthermore proper, returning a hypothesis that is itself a decision tree; previously no such algorithm was known even in the noiseless setting. 2012 ACM Subject Classification Theory of computation → Boolean function learning

[1]  Tao Jiang,et al.  Lower Bounds on Learning Decision Lists and Trees , 1995, Inf. Comput..

[2]  Thomas R. Hancock Learning kμ decision trees on the uniform distribution , 1993, COLT '93.

[3]  Rocco A. Servedio,et al.  Agnostically learning halfspaces , 2005, 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS'05).

[4]  Amit Daniely,et al.  ID3 Learns Juntas for Smoothed Product Distributions , 2020, COLT.

[5]  Guy Blanc,et al.  Universal guarantees for decision tree induction via a higher-order splitting criterion , 2020, NeurIPS.

[6]  Adam R. Klivans,et al.  Learning Neural Networks with Two Nonlinear Layers in Polynomial Time , 2017, COLT.

[7]  Guy Blanc,et al.  Top-down induction of decision trees: rigorous guarantees and inherent limitations , 2019, Electron. Colloquium Comput. Complex..

[8]  Noam Nisan,et al.  Constant depth circuits, Fourier transform, and learnability , 1989, 30th Annual Symposium on Foundations of Computer Science.

[9]  Nader H. Bshouty,et al.  Exact learning via the Monotone theory , 1993, Proceedings of 1993 IEEE 34th Annual Foundations of Computer Science.

[10]  Ankur Moitra,et al.  Beyond the low-degree algorithm: mixtures of subcubes and their applications , 2018, STOC.

[11]  Yishay Mansour,et al.  Weakly learning DNF and characterizing statistical query learning using Fourier analysis , 1994, STOC '94.

[12]  Rocco A. Servedio,et al.  On Learning Random DNF Formulas Under the Uniform Distribution , 2005, Theory Comput..

[13]  Yishay Mansour,et al.  On the boosting ability of top-down decision tree learning algorithms , 1996, STOC '96.

[14]  Adam Tauman Kalai,et al.  The Hebrew University , 1998 .

[15]  Raghu Meka,et al.  Learning One Convolutional Layer with Overlapping Patches , 2018, ICML.

[16]  Robert E. Schapire,et al.  Efficient distribution-free learning of probabilistic concepts , 1990, Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science.

[17]  David Haussler,et al.  Learning decision trees from random examples , 1988, COLT '88.

[18]  Adam R. Klivans,et al.  Superpolynomial Lower Bounds for Learning One-Layer Neural Networks using Gradient Descent , 2020, ICML.

[19]  Ronald L. Rivest,et al.  Learning decision lists , 2004, Machine Learning.

[20]  Adam Tauman Kalai,et al.  Agnostically learning decision trees , 2008, STOC.

[21]  Dinesh P. Mehta,et al.  Decision Tree Approximations of Boolean Functions , 2000, COLT.

[22]  R. Schapire,et al.  Toward efficient agnostic learning , 1992, COLT '92.

[23]  David Haussler,et al.  Decision Theoretic Generalizations of the PAC Model for Neural Net and Other Learning Applications , 1992, Inf. Comput..

[24]  Guy Blanc,et al.  Provable guarantees for decision tree induction: the agnostic setting , 2020, ICML.

[25]  Adam R. Klivans,et al.  Statistical-Query Lower Bounds via Functional Gradients , 2020, NeurIPS.

[26]  Yang Yuan,et al.  Hyperparameter Optimization: A Spectral Approach , 2017, ICLR.

[27]  Rocco A. Servedio,et al.  On the learnability of monotone functions , 2009 .

[28]  Ryan O'Donnell,et al.  Learning monotone decision trees in polynomial time , 2006, 21st Annual IEEE Conference on Computational Complexity (CCC'06).

[29]  Rocco A. Servedio,et al.  On Learning Random DNF Formulas Under the Uniform Distribution , 2005, Theory of Computing.

[30]  Eyal Kushilevitz,et al.  Learning decision trees using the Fourier spectrum , 1991, STOC '91.

[31]  Eyal Kushilevitz,et al.  PAC learning with nasty noise , 1999, Theor. Comput. Sci..

[32]  Rocco A. Servedio,et al.  Toward Attribute Efficient Learning of Decision Lists and Parities , 2006, J. Mach. Learn. Res..

[33]  Avrim Blum Rank-r Decision Trees are a Subclass of r-Decision Lists , 1992, Inf. Process. Lett..