Incorporating Confidence in a Naive Bayesian Classifier

Naive Bayes is a relatively simple classification method to, e.g., rate TV programs as interesting or uninteresting to a user. In case the training set consists of instances, chosen randomly from the instance space, the posterior probability estimates are random variables. Their statistical properties can be used to calculate confidence intervals around them, enabling more refined classification strategies than the usual argmax-operator. This may alleviate the cold-start problem and provide additional feedback to the user. In this paper, we give an explicit expression to estimate the variances of the posterior probability estimates from the training data and investigate the strategy that refrains from classification in case the confidence interval around the largest posterior probability overlaps with any of the other intervals. We show that the classification error rate can be significantly reduced at the cost of a lower coverage, i.e., the fraction of classifiable instances, in a TV-program recommender.