An information theoretic approach for feature selection

Feature selection methods play a significant role during classification of data having high dimensions of features. The methods select most relevant subset of features that describe data appropriately. Mutual information (MI) based upon information theory is one of the metrics used for measuring relevance of features. This paper analyses various feature selection methods for (1) reduction in number of features; (2) performance of Naive Bayes classification model trained on reduced set of features. Research gaps identified are: (1) computation of MI from the whole sample space instead of unclassified sample subspace; (2) consideration of relevance of features only or tradeoff between relevance and redundancy, but class conditional interaction of features is ignored. In this paper, we propose a general evaluation function using MI for feature selection. The proposed evaluation function is implemented which use dynamically computed MI values from unclassified instances. Effectiveness of the proposed feature selection method is done empirically by comparing classification results using KDD 1999 benchmarked dataset of intrusion detection. The results indicate practicability and effectiveness of the proposed method for applications concerned with high accuracy and stability of predictions. Copyright © 2011 John Wiley & Sons, Ltd.

[1]  Roberto Battiti,et al.  Using mutual information for selecting features in supervised neural net learning , 1994, IEEE Trans. Neural Networks.

[2]  Wei-Yang Lin,et al.  Intrusion detection by machine learning: A review , 2009, Expert Syst. Appl..

[3]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[4]  Anil K. Jain,et al.  Feature Selection: Evaluation, Application, and Small Sample Performance , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  John E. Moody,et al.  Data Visualization and Feature Selection: New Algorithms for Nongaussian Data , 1999, NIPS.

[6]  Huan Liu,et al.  Consistency-based search in feature selection , 2003, Artif. Intell..

[7]  Chong-Ho Choi,et al.  Input feature selection for classification problems , 2002, IEEE Trans. Neural Networks.

[8]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[9]  R. Bhaskaran,et al.  A Study on Feature Selection Techniques in Educational Data Mining , 2009, ArXiv.

[10]  Shimon Ullman,et al.  Object recognition with informative features and linear classification , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[11]  Huan Liu,et al.  Efficient Feature Selection via Analysis of Relevance and Redundancy , 2004, J. Mach. Learn. Res..

[12]  Lei Liu,et al.  Feature selection with dynamic mutual information , 2009, Pattern Recognit..

[13]  Dahua Lin,et al.  Conditional Infomax Learning: An Integrated Framework for Feature Extraction and Fusion , 2006, ECCV.

[14]  Walter Daelemans,et al.  Combined Optimization of Feature Selection and Algorithm Parameter Interaction in Machine Learning of Language , 2003 .

[15]  Ian Witten,et al.  Data Mining , 2000 .

[16]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  F. Fleuret Fast Binary Feature Selection with Conditional Mutual Information , 2004, J. Mach. Learn. Res..

[18]  Richard Bellman,et al.  Adaptive Control Processes: A Guided Tour , 1961, The Mathematical Gazette.

[19]  Mohamed A. Deriche,et al.  An optimal feature selection technique using the concept of mutual information , 2001, Proceedings of the Sixth International Symposium on Signal Processing and its Applications (Cat.No.01EX467).

[20]  Larry A. Rendell,et al.  A Practical Approach to Feature Selection , 1992, ML.

[21]  Gulshan Kumar,et al.  The use of artificial intelligence based techniques for intrusion detection: a review , 2010, Artificial Intelligence Review.