Boosting Using Branching Programs

It is known that decision tree learning can be viewed as a form of boosting. Given a weak learning hypothesis one can show that the training error of a decision tree declines as |T| where |T| is the size of the decision tree and b is a constant determined by the weak learning hypothesis. Here we consider the case of decision DAGs—decision trees in which a given node can be shared by different branches of the tree, also called branching programs (BP). Node sharing allows a branching program to be exponentially more compact than the corresponding decision tree. We show that under the same weak learning assumption used for decision tree learning there exists a greedy BP-growth algorithm whose training error is guaranteed to decline as 2−b`|T|, where |T| is the size of the branching program and b is a constant determined by the weak learning hypothesis. Therefore, from the perspective of boosting theory, branching programs are exponentially more efficient than decision trees.