论文信息 - Using Language Models for Text Classification

Using Language Models for Text Classification

paper describes an approach to text classification using language models. This approach is a natural extension of the traditional Naive Bayes classifier, in which we replace the Laplace smoothing by some more sophisticated smoothing methods. In this paper, we tested four smoothing methods commonly used in information retrieval. Our experimental results show that using a language model, we are able to obtain better performance than traditional Naive Bayes classifier. In addition, we also introduce into the existing smoothing methods an additional factor of smoothing scale according to the amount of training data of the class, and this allows us to further improve the classification performance.

Jian-Yun Nie | Jing Bai | Jian-Yun Nie | Jing Bai

[1] John D. Lafferty,et al. A study of smoothing methods for language models applied to Ad Hoc information retrieval , 2001, SIGIR '01.

[2] Dale Schuurmans,et al. Combining Naive Bayes and n-Gram Language Models for Text Classification , 2003, ECIR.

[3] John D. Lafferty,et al. Two-stage language models for information retrieval , 2002, SIGIR '02.

[4] Yiming Yang,et al. A re-examination of text categorization methods , 1999, SIGIR '99.

[5] Susan T. Dumais,et al. Inductive learning algorithms and representations for text categorization , 1998, CIKM '98.

[6] Wessel Kraaij,et al. TNO at TDT2001: Language Model-Based Topic Detection , 2001 .

[7] Andrew McCallum,et al. A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[8] Yiming Yang,et al. A study of thresholding strategies for text categorization , 2001, SIGIR '01.

[9] Yiming Yang,et al. An Evaluation of Statistical Approaches to Text Categorization , 1999, Information Retrieval.