The Optimization of Threshold-Based Naive Bayesian Algorithm

In order to realize the text classification and spam filtering, the Naive Bayesian algorithm estimate what class are the text in by basing on some statistical probability values in accordance with the characteristic in straining sample, but it is easy to expose the overflow problem, this article will optimize the algorithm by setting the threshold, the optimization strategy is comparing the times that the probability of each class exceed the threshold and the accumulated probability values at the same times. Compare with the existing method, experimental result show the new method not only can solve the overflow problem, but also improve the classification effect effectively.