Naive Bayes Classification Given Probability Estimation Trees

Tree induction is one of the most effective and widely used models in classification. Unfortunately, decision trees such as C4.5 have been found to provide poor probability estimates. By the empirical studies, Provost and Domingos found that probability estimation trees (PETs) give a fairly good probability estimation. However, different from normal decision trees, pruning reduces the performances of PETs. In order to get a good probability estimation, we usually need large trees which are not good in terms of the model transparency. In this paper, two hybrid models by combining the naive Bayes classifier and PETs are proposed in order to build a model with good performance without losing too much transparency. The first model use naive Bayes estimation given a PET and the second model use a group of small-sized PETs as naive Bayes estimators. Empirical studies show that the first model outperforms the PET model at shallow depth and the second model is equivalent to naive Bayes and PET