Optimally Combining Positive and Negative Features for Text Categorization

This paper presents a novel local feature selection approach for text categorization. It constructs a feature set for each category by first selecting a set of terms highly indicative of membership as well as another set of terms highly indicative of non-membership, then unifying the two sets. The size ratio of the two sets was empirically chosen to obtain optimal performance. This is in contrast with the standard local feature selection approaches that either (1) only select the terms most indicative of membership; or (2) implicitly but not optimally combine the terms most indicative of membership with non-membership. The experimental comparison between the proposed approach and standard approaches was conducted on four feature selection metrics: chisquare, correlation coefficient, odds ratio, and GSS coefficient. The results show that the proposed approach improves text categorization performance.

[1]  Kostas Tzeras,et al.  Automatic indexing based on Bayesian inference networks , 1993, SIGIR.

[2]  Sholom M. Weiss,et al.  Towards language independent automated learning of text categorization models , 1994, SIGIR '94.

[3]  Yiming Yang,et al.  Expert network: effective and efficient learning from human decisions in text categorization and retrieval , 1994, SIGIR '94.

[4]  David D. Lewis,et al.  A comparison of two learning algorithms for text categorization , 1994 .

[5]  Hinrich Schütze,et al.  A comparison of classifiers and document representations for the routing problem , 1995, SIGIR '95.

[6]  Andreas S. Weigend,et al.  A neural network approach to topic spotting , 1995 .

[7]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[8]  Hwee Tou Ng,et al.  Feature selection, perceptron learning, and a usability case study for text categorization , 1997, SIGIR '97.

[9]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[10]  Dunja Mladenic,et al.  Machine Learning on non-homogeneous, distributed text data , 1998 .

[11]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[12]  Padmini Srinivasan,et al.  Hierarchical neural networks for text categorization , 1999, SIGIR 1999.

[13]  Maria Simi,et al.  Experiments on the Use of Feature Selection and Negative Evidence in Automated Text Categorization , 2000, ECDL.

[14]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[15]  Yiming Yang,et al.  An Evaluation of Statistical Approaches to Text Categorization , 1999, Information Retrieval.

[16]  Andreas S. Weigend,et al.  Exploiting Hierarchy in Text Categorization , 1999, Information Retrieval.

[17]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.