论文信息 - Boosting to correct inductive bias in text classification

Boosting to correct inductive bias in text classification

This paper studies the effects of boosting in the context of different classification methods for text categorization, including Decision Trees, Naive Bayes, Support Vector Machines (SVMs) and a Rocchio-style classifier. We identify the inductive biases of each classifier and explore how boosting, as an error-driven resampling mechanism, reacts to those biases. Our experiments on the Reuters-21578 benchmark show that boosting is not effective in improving the performance of the base classifiers on common categories. However, the effect of boosting for rare categories varies across classifiers: for SVMs and Decision Trees, we achieved a 13-17% performance improvement in macro-averaged F1 measure, but did not obtain substantial improvement for the other two classifiers. This interesting finding of boosting on rare categories has not been reported before.

Yiming Yang | Jaime G. Carbonell | Yan Liu

[1] Yoram Singer,et al. BoosTexter: A Boosting-based System for Text Categorization , 2000, Machine Learning.

[2] James Allan,et al. The effect of adding relevance information in a relevance feedback environment , 1994, SIGIR '94.

[3] Yiming Yang,et al. High-performing feature selection for text classification , 2002, CIKM '02.

[4] John D. Lafferty,et al. Boosting and Maximum Likelihood for Exponential Models , 2001, NIPS.

[5] Yiming Yang,et al. An Evaluation of Statistical Approaches to Text Categorization , 1999, Information Retrieval.

[6] Leo Breiman,et al. Bagging Predictors , 1996, Machine Learning.

[7] Thorsten Joachims,et al. Making large scale SVM learning practical , 1998 .

[8] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.

[9] Thorsten Joachims,et al. Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[10] David E. Johnson,et al. Maximizing Text-Mining Performance , 1999 .

[11] Leslie G. Valiant,et al. A theory of the learnable , 1984, CACM.