Extended Parametric Mixture Model for Robust Multi-labeled Text Categorization

In this paper, we propose an extended variant of the parametric mixture model (PMM), which we recently proposed for multi-class and multi-labeled text categorization. In the extended model (EPMM), latent categories are incorporated in the PMM so that it can adaptively control the model’s flexibility according to the data while maintaining the validity of parametric mixture assumption of the original PMM. In the multi-label setting, we experimentally compare a Naive Bayes classifier (NB), Support Vector Machines (SVM), PMM and EPMM for their robustness against classification noise as well as classification performance. The results show that EPMM provides higher classification performance than PMM while keeping the advantage of greater robustness against noise than that by NB and SVM.