Application of Multinomial Mixture Model to Text Classification

The goal of text document classification is to assign a new document into one class from the predefined classes based on its contents. In this paper, a mixture of multinomial distributions is proposed as a model for class-conditional distributions in document classification task. A bag-of-words approach to vector document representation is employed. It is shown, that the accuracy of the Bayes document classifier can be improved by the proposed model in comparison with the Bayes classifiers based on the multivariate Bernoulli model, the multinomial model as well as the multivariate Bernoulli mixture model. Experimental results on the Reuters and the Newsgroups data sets indicate the effectiveness of the multinomial mixture model. Furthermore, an increase in classification accuracy is achieved for small training data sets, when multiclass Bhattacharyya distance is used instead of average mutual information as a feature selection criterion.

[1]  Andrew McCallum,et al.  A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[2]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[3]  Céline Rouveirol,et al.  Machine Learning: ECML-98 , 1998, Lecture Notes in Computer Science.

[4]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[5]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[6]  Alfons Juan-Císcar,et al.  On the use of Bernoulli mixture models for text classification , 2001, Pattern Recognit..

[7]  Sebastian Thrun,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.

[8]  Anil K. Jain,et al.  Statistical Pattern Recognition: A Review , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Josef Kittler,et al.  Divergence Based Feature Selection for Multimodal Class Densities , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Kari Torkkola,et al.  Discriminative features for document classification , 2002, Object recognition supported by user interaction for service robots.

[11]  Andrew R. Webb,et al.  Statistical Pattern Recognition , 1999 .

[12]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.