论文信息 - Words as Rules: Feature Selection in Text Categorization

Words as Rules: Feature Selection in Text Categorization

In Text Categorization problems usually there is a lot of noisy and irrelevant information present. In this paper we propose to apply some measures taken from the Machine Learning environment for Feature Selection. The classifier used is Support Vector Machines. The experiments over two different corpora show that some of the new measures perform better than the traditional Information Theory measures.

José Ranilla | Irene Díaz | Elías F. Combarro | Elena Montañés | José Ramón Quevedo

[1] Céline Rouveirol,et al. Machine Learning: ECML-98 , 1998, Lecture Notes in Computer Science.

[2] Sholom M. Weiss,et al. Automated learning of decision rules for text categorization , 1994, TOIS.

[3] Dunja Mladenic,et al. Feature Selection for Unbalanced Class Distribution and Naive Bayes , 1999, ICML.

[4] José Ranilla,et al. A Comparison of the Performance of SVM and ARNI on Text Categorization with New Filtering Measures on an Unbalanced Collection , 2003, IWANN.

[5] Dieter Fensel,et al. Problem-Solving Methods , 2001, Lecture Notes in Computer Science.

[6] Fabrizio Sebastiani,et al. Machine learning in automated text categorization , 2001, CSUR.

[7] Thorsten Joachims,et al. Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[8] David Starer,et al. Artificial Neural Nets , 1995 .

[9] Maria Simi,et al. Experiments on the Use of Feature Selection and Negative Evidence in Automated Text Categorization , 2000, ECDL.

[10] José Ranilla,et al. Improving performance of text categorization by combining filtering and supportvector machines , 2004, J. Assoc. Inf. Sci. Technol..