论文信息 - Improving Text Classification by Shrinkage in a Hierarchy of Classes

Improving Text Classification by Shrinkage in a Hierarchy of Classes

When documents are organized in a large number of topic categories, the categories are often arranged in a hierarchy. The U.S. patent database and Yahoo are two examples. This paper shows that the accuracy of a naive Bayes text classi er can be signi cantly improved by taking advantage of a hierarchy of classes. We adopt an established statistical technique called shrinkage that smoothes parameter estimates of a data-sparse child with its parent in order to obtain more robust parameter estimates. The approach is also employed in deleted interpolation, a technique for smoothing n-grams in language modeling for speech recognition. Our method scales well to large data sets, with numerous categories in large hierarchies. Experimental results on three real-world data sets from UseNet, Yahoo, and corporate web pages show improved performance, with a reduction in error up to 29% over the traditional at classi er.

[1] H FriedmanJerome. On Bias, Variance, 0/1Loss, and the Curse-of-Dimensionality , 1997 .

[2] Thomas Hofmann,et al. Statistical Models for Co-occurrence Data , 1998 .

[3] C. Stein,et al. Estimation with Quadratic Loss , 1992 .

[4] Thorsten Joachims,et al. A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization , 1997, ICML.

[5] Yoram Singer,et al. Adaptive Mixtures of Probabilistic Transducers , 1995, Neural Computation.

[6] G Salton,et al. Developments in Automatic Text Retrieval , 1991, Science.

[7] A. Rukhin. Bayes and Empirical Bayes Methods for Data Analysis , 1997 .

[8] David D. Lewis,et al. A comparison of two learning algorithms for text categorization , 1994 .

[9] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[10] Ronald Rosenfeld,et al. Using story topics for language model adaptation , 1997, EUROSPEECH.

[11] Lalit R. Bahl,et al. A tree-based statistical language model for natural language speech recognition , 1989, IEEE Trans. Acoust. Speech Signal Process..

[12] Andrew McCallum,et al. A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[13] David J. C. MacKay,et al. A hierarchical Dirichlet language model , 1995, Natural Language Engineering.

[14] Pedro M. Domingos,et al. Beyond Independence: Conditions for the Optimality of the Simple Bayesian Classifier , 1996, ICML.

[15] Sebastian Thrun,et al. Learning to Classify Text from Labeled and Unlabeled Documents , 1998, AAAI/IAAI.

[16] Daphne Koller,et al. Hierarchically Classifying Documents Using Very Few Words , 1997, ICML.

[17] Frederick Jelinek,et al. Interpolated estimation of Markov source parameters from sparse data , 1980 .

[18] H. Johnson,et al. A comparison of 'traditional' and multimedia information systems development practices , 2003, Inf. Softw. Technol..