论文信息 - Text genre classification with genre-revealing and subject-revealing features

Text genre classification with genre-revealing and subject-revealing features

Subject or prepositional content has been the focus of most classification research. Genre or style, on the other hand, is a different and important property of text, and automatic text genre classification is becoming important for classification and retrieval purposes as well as for some natural language processing research. In this paper, we present a method for automatic genre classification that is based on statistically selected features obtained from both subject-classified and genre classified training data. The experimental results show that the proposed method outperforms a direct application of a statistical learner often used for subject classification. We also observe that the deviation formula and discrimination formula using document frequency ratios also work as expected. We conjecture that this dual feature set approach can be generalized to improve the performance of subject classification as well.

Sung-Hyon Myaeng | Yong-Bae Lee

[1] David D. Lewis,et al. A comparison of two learning algorithms for text categorization , 1994 .

[2] Jussi Karlgren,et al. Assembling a Balanced Corpus from the Internet , 1998, NODALIDA.

[3] Sung-Hyon Myaeng,et al. A practical hypertext catergorization method using links and incrementally available class information , 2000, SIGIR '00.

[4] Hinrich Schütze,et al. Automatic Detection of Text Genre , 1997, ACL.

[5] Jussi Karlgren,et al. Iterative Information Retrieval Using Fast Clustering and Usage-Specific Genres , 1999 .

[6] Jussi Karlgren,et al. Stylistic Variation in an Information Retrieval Experiment , 1996, ArXiv.

[7] Andrew Dillon,et al. Genres and the WEB: Is the personal home page the first uniquely digital genre? , 2000, J. Am. Soc. Inf. Sci..

[8] Efstathios Stamatatos,et al. Text Genre Detection Using Common Word Frequencies , 2000, COLING.

[9] Jussi Karlgren,et al. Web-Specific Genre Visualization , 1998, WebNet.

[10] Jussi Karlgren,et al. Recognizing Text Genres With Simple Metrics Using Discriminant Analysis , 1994, COLING.

[11] Andrew Dillon,et al. Genres and the Web: is the personal home page the first uniquely digital genre? , 2000 .

[12] Yiming Yang,et al. A re-examination of text categorization methods , 1999, SIGIR '99.