Decision tree classification with bounded number of errors

Abstract Oblivious decision trees are decision trees where every node in the same level is associated with the same attribute. These trees have been studied in the context of feature selection. In this paper, we study the problem of constructing an oblivious decision tree that incurs at most k classification errors, where k is a given integer. We present a randomized rounding algorithm that, given a parameter 0 ϵ 1 / 2 , builds an oblivious decision tree with cost at most ( 3 / ( 1 − 2 ϵ ) ) ln ⁡ ( n ) O P T ( I ) and produces at most ( k / ϵ ) errors, where O P T ( I ) is the optimal cost and n is the number of objects. The probability of failure of this algorithm is at most ( n − 1 ) / 2 n 2 . The logarithmic factor in the cost of the tree is the best possible attainable, even for k = 0 , unless P = NP .

[1]  Rajiv Gandhi,et al.  Approximation algorithms for partial covering problems , 2004, J. Algorithms.

[2]  C. Scott,et al.  Group-Based Active Query Selection for Rapid Diagnosis in Time-Critical Situations , 2012, IEEE Transactions on Information Theory.

[3]  Jeff A. Bilmes,et al.  Interactive Submodular Set Cover , 2010, ICML.

[4]  Lisa Hellerstein,et al.  Approximation Algorithms for Stochastic Boolean Function Evaluation and Stochastic Submodular Set Cover , 2013, SODA.

[5]  Lisa Hellerstein,et al.  Evaluation of Monotone DNF Formulas , 2015, Algorithmica.

[6]  Jeff A. Bilmes,et al.  Average-Case Active Learning with Costs , 2009, ALT.

[7]  Jeffrey C. Schlimmer,et al.  Efficiently Inducing Determinations: A Complete and Systematic Search Algorithm that Uses Optimal Pruning , 1993, ICML.

[8]  Andreas Krause,et al.  Adaptive Submodularity: Theory and Applications in Active Learning and Stochastic Optimization , 2010, J. Artif. Intell. Res..

[9]  Ferdinando Cicalese,et al.  Diagnosis determination: decision trees optimizing simultaneously worst and expected testing cost , 2014, ICML.

[10]  Andreas Krause,et al.  Near-Optimal Bayesian Active Learning with Noisy Observations , 2010, NIPS.

[11]  Ron Kohavi,et al.  Oblivious Decision Trees, Graphs, and Top-Down Pruning , 1995, IJCAI.

[12]  Ferdinando Cicalese,et al.  Decision Trees for Function Evaluation: Simultaneous Optimization of Worst and Expected Cost , 2013, Algorithmica.

[13]  Steven Skiena,et al.  Decision trees for geometric models , 1993, SCG '93.

[14]  Pat Langley,et al.  Oblivious Decision Trees and Abstract Cases , 1994 .

[15]  Laurence A. Wolsey,et al.  An analysis of the greedy algorithm for the submodular set covering problem , 1982, Comb..

[16]  Haim Kaplan,et al.  Learning with attribute costs , 2005, STOC '05.