An Incremental Bayes Classification Model

Classification has been considered as a hot research area in machine learning, pattern recognition and data mining. Incremental learning is an effective method for learning the classification knowledge from massive data, especially in the situation of high cost in getting labeled training examples. Firstly, this paper discusses the difference between Bayesian estimation and classical parameter estimation and denotes the fundamental principle for incorporating the prior knowledge in Bayesian learning. Then we provide the incremental Bayesian learning model. This model explains the Bayesian learning process that changes the belief with the prior knowledge and new examples information. By selecting the Dirichlet prior distribution, we show this process in detail. In the second session, we mainly discuss the incremental process. For new examples for incremental learning, there exist two statuses: with labels and without labels. As for examples with labels, it is easy to update the classification parameter with the help of conjunct Dirichlet distribution. So it is the key point to learn from unlabeled examples. Different from the method provided by Kamal Nigam, which learns from unlabeled examples using EM algorithm, we focus on the next example that would be selected in learning. This paper gives a method measuring the classification loss with 0 1 loss. We will select the examples that minimize the classification loss. Meanwhile, to improve the algorithm performance, the pool based technique is introduced. For each turn, we only compute the classification loss for examples in pool. Because the basic operations in learning are updating the classification parameters and classifying test instances incrementally, we give their approximate expressions. For testing algorithm's efficiency, this paper makes an experiment on mushroom data set in UCI repository. The initial training set contains 6 labeled examples. Then several unlabeled examples are added. The final experimental results show that this algorithm is feasible and effective.