Conditional Mutual Information Based Feature Selection for Classification Task

We propose a sequential forward feature selection method to find a subset of features that are most relevant to the classification task. Our approach uses novel estimation of the conditional mutual information between candidate feature and classes, given a subset of already selected features which is utilized as a classifier independent criterion for evaluation of feature subsets. The proposed mMIFS-U algorithm is applied to text classification problem and compared with MIFS method and MIFS-U method proposed by Battiti and Kwak and Choi, respectively. Our feature selection algorithm outperforms MIFS method and MIFS-U in experiments on high dimensional Reuters textual data.

[1]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[2]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[3]  Yiming Yang,et al.  A study of thresholding strategies for text categorization , 2001, SIGIR '01.

[4]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[5]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  F. Fleuret Fast Binary Feature Selection with Conditional Mutual Information , 2004, J. Mach. Learn. Res..

[8]  Roberto Battiti,et al.  Using mutual information for selecting features in supervised neural net learning , 1994, IEEE Trans. Neural Networks.

[9]  Anil K. Jain,et al.  Statistical Pattern Recognition: A Review , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Huan Liu,et al.  Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution , 2003, ICML.

[11]  Huan Liu,et al.  Consistency-based search in feature selection , 2003, Artif. Intell..

[12]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[13]  Chong-Ho Choi,et al.  Input feature selection for classification problems , 2002, IEEE Trans. Neural Networks.

[14]  Huan Liu,et al.  Toward integrating feature selection algorithms for classification and clustering , 2005, IEEE Transactions on Knowledge and Data Engineering.

[15]  George Forman,et al.  An Extensive Empirical Study of Feature Selection Metrics for Text Classification , 2003, J. Mach. Learn. Res..

[16]  Andrew McCallum,et al.  A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[17]  Chong-Ho Choi,et al.  Improved mutual information feature selector for neural networks in supervised learning , 1999, IJCNN'99. International Joint Conference on Neural Networks. Proceedings (Cat. No.99CH36339).

[18]  Huan Liu,et al.  Feature selection for clustering - a filter solution , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..