论文信息 - The importance of stop word removal on recall values in text categorization

The importance of stop word removal on recall values in text categorization

Given a data set and a learning task such as classification, there are two prime motives for executing some kind of data set reduction. On one hand there is the possible algorithm performance improvement. On the other hand the decrease in the overall size of the data set can bring advantages in storage space used and time spent computing. Our purpose is to determine the importance of several basic reduction techniques on Support Vector Machines, by comparing their relative performance improvement when applied on the standard REUTERS-21578 benchmark.

[1] Susan T. Dumais,et al. Inductive learning algorithms and representations for text categorization , 1998, CIKM '98.

[2] Thorsten Joachims,et al. Learning to classify text using support vector machines - methods, theory and algorithms , 2002, The Kluwer international series in engineering and computer science.

[3] Vladimir N. Vapnik,et al. The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[4] Thorsten Joachims,et al. Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[5] S. Gunn. Support Vector Machines for Classification and Regression , 1998 .

[6] Alexander J. Smola,et al. Support Vector Regression Machines , 1996, NIPS.

[7] James T. Kwok,et al. Automated Text Categorization Using Support Vector Machine , 1998, ICONIP.