Feature selection with conditional mutual information maximin in text categorization

Feature selection is an important component of text categorization. This technique can both increase a classifier's computation speed, and reduce the overfitting problem. Several feature selection methods, such as information gain and mutual information, have been widely used. Although they greatly improve the classifier's performance, they have a common drawback, which is that they do not consider the mutual relationships among the features. In this situation, where one feature's predictive power is weakened by others, and where the selected features tend to bias towards major categories, such selection methods are not very effective. In this paper, we propose a novel feature selection method for text categorization called <i>conditional mutual information maximin</i> (CMIM). It can select a set of individually discriminating and weakly dependent features. The experimental results show that CMIM can perform much better than traditional feature selection methods.

[1]  Richard A. Harshman,et al.  Indexing by latent semantic indexing , 1990 .

[2]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[3]  Yiming Yang,et al.  A Linear Least Squares Fit Mapping Method for Information Retrieval From Natural Language Texts , 1992, COLING.

[4]  Ted Dunning,et al.  Accurate Methods for the Statistics of Surprise and Coincidence , 1993, CL.

[5]  David D. Lewis,et al.  A comparison of two learning algorithms for text categorization , 1994 .

[6]  Susan T. Dumais,et al.  Using Linear Algebra for Intelligent Information Retrieval , 1995, SIAM Rev..

[7]  Hinrich Schütze,et al.  A comparison of classifiers and document representations for the routing problem , 1995, SIGIR '95.

[8]  Andreas S. Weigend,et al.  A neural network approach to topic spotting , 1995 .

[9]  Thorsten Joachims,et al.  A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization , 1997, ICML.

[10]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[11]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[12]  Tom M. Mitchell,et al.  Learning to Extract Symbolic Knowledge from the World Wide Web , 1998, AAAI/IAAI.

[13]  Dunja Mladenic,et al.  Feature Subset Selection in Text-Learning , 1998, ECML.

[14]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[15]  Andrew McCallum,et al.  Using Maximum Entropy for Text Classification , 1999 .

[16]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorisation: a survey , 1999 .

[17]  J. Moody,et al.  Feature Selection Based on Joint Mutual Information , 1999 .

[18]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[19]  Maria Simi,et al.  Experiments on the Use of Feature Selection and Negative Evidence in Automated Text Categorization , 2000, ECDL.

[20]  Stan Matwin,et al.  A learner-independent evaluation of the usefulness of statistical phrases for automated text categorization , 2001 .

[21]  Constantin F. Aliferis,et al.  Algorithms for Large Scale Markov Blanket Discovery , 2003, FLAIRS.

[22]  F. Fleuret Binary Feature Selection with Conditional Mutual Information , 2003 .

[23]  Pedro M. Domingos,et al.  On the Optimality of the Simple Bayesian Classifier under Zero-One Loss , 1997, Machine Learning.

[24]  F. Fleuret Fast Binary Feature Selection with Conditional Mutual Information , 2004, J. Mach. Learn. Res..