On the Optimality of Probability Estimation by Random Decision Trees

Random decision tree is an ensemble of decision trees. The feature at any node of a tree in the ensemble is chosen randomly from remaining features. A chosen discrete feature on a decision path cannot be chosen again. Continuous feature can be chosen multiple times, however, with a different splitting value each time. During classification, each tree outputs raw posterior probability. The probabilities from each tree in the ensemble are averaged as the final posterior probability estimate. Although remarkably simple and somehow counter-intuitive, random decision tree has been shown to be highly accurate under 0-1 loss and cost-sensitive loss functions. Preliminary explanation of its high accuracy is due to the "error-tolerance" property of probabilistic decision making. Our study has shown that the actual reason for random tree's superior performance is due to its optimal approximation to each example's true probability to be a member of a given class.