Fuzzy Mutual Information Based min-Redundancy and Max-Relevance Heterogeneous Feature Selection

Feature selection is an important preprocessing step in pattern classification and machine learning, and mutual information is widely used to measure relevance between features and decision. However, it is difficult to directly calculate relevance between continuous or fuzzy features using mutual information. In this paper we introduce the fuzzy information entropy and fuzzy mutual information for computing relevance between numerical or fuzzy features and decision. The relationship between fuzzy information entropy and differential entropy is also discussed. Moreover, we combine fuzzy mutual information with ”min-Redundancy-Max-Relevance”, ”Max-Dependency” and ”min-Redundancy-Max-Dependency” algorithms. The performance and stability of the proposed algorithms are tested on benchmark data sets. Experimental results show the proposed algorithms are effective and stable.

[1]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[2]  Edward A. Patrick,et al.  A Generalized k-Nearest Neighbor Rule , 1970, Inf. Control..

[3]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[4]  Thomas M. Cover,et al.  The Best Two Independent Measurements Are Not the Two Best , 1974, IEEE Trans. Syst. Man Cybern..

[5]  Keinosuke Fukunaga,et al.  A Branch and Bound Algorithm for Feature Subset Selection , 1977, IEEE Transactions on Computers.

[6]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[7]  Zdzisław Pawlak,et al.  Dependency of attributes in information systems , 1985 .

[8]  Jeffrey C. Schlimmer,et al.  Efficiently Inducing Determinations: A Complete and Systematic Search Algorithm that Uses Optimal Pruning , 1993, ICML.

[9]  Roberto Battiti,et al.  Using mutual information for selecting features in supervised neural net learning , 1994, IEEE Trans. Neural Networks.

[10]  Igor Kononenko,et al.  Estimating Attributes: Analysis and Extensions of RELIEF , 1994, ECML.

[11]  Josef Kittler,et al.  Floating search methods in feature selection , 1994, Pattern Recognit. Lett..

[12]  Andrew K. C. Wong,et al.  Class-Dependent Discretization for Inductive Learning from Continuous and Mixed-Mode Data , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  Hiroshi Motoda,et al.  Feature Selection for Knowledge Discovery and Data Mining , 1998, The Springer International Series in Engineering and Computer Science.

[14]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[15]  Pavel Paclík,et al.  Adaptive floating search methods in feature selection , 1999, Pattern Recognit. Lett..

[16]  Mark A. Hall,et al.  Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning , 1999, ICML.

[17]  Chih-Ming Chen,et al.  An efficient fuzzy classifier with feature selection based on fuzzy entropy , 2001, IEEE Trans. Syst. Man Cybern. Part B.

[18]  Chong-Ho Choi,et al.  Input Feature Selection by Mutual Information Based on Parzen Window , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  Chong-Ho Choi,et al.  Input feature selection for classification problems , 2002, IEEE Trans. Neural Networks.

[20]  Huan Liu,et al.  Consistency-based search in feature selection , 2003, Artif. Intell..

[21]  Andrzej Skowron,et al.  Rough set methods in feature selection and recognition , 2003, Pattern Recognit. Lett..

[22]  Yi Shen,et al.  A quantitative method for evaluating the performances of hyperspectral image fusion , 2003, IEEE Trans. Instrum. Meas..

[23]  David A. Bell,et al.  A Formalism for Relevance and Its Application in Feature Subset Selection , 2000, Machine Learning.

[24]  Qiang Shen,et al.  Selecting informative features with fuzzy-rough sets and its application for complex systems monitoring , 2004, Pattern Recognit..

[25]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[26]  Manju Bansal,et al.  A novel method for prokaryotic promoter prediction based on DNA stability , 2005, BMC Bioinformatics.

[27]  Anil K. Jain,et al.  Large scale feature selection using modified random mutation hill climbing , 2004, ICPR 2004.

[28]  Josef Kittler,et al.  Fast branch & bound algorithms for optimal feature selection , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Huan Liu,et al.  Efficient Feature Selection via Analysis of Relevance and Redundancy , 2004, J. Mach. Learn. Res..

[30]  Chris H. Q. Ding,et al.  Minimum Redundancy Feature Selection from Microarray Gene Expression Data , 2005, J. Bioinform. Comput. Biol..

[31]  Graziano Pesole,et al.  Regularized Least Squares Cancer Classifiers from DNA microarray data , 2005, BMC Bioinformatics.

[32]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Q. Wang,et al.  A nonlinear correlation measure for multivariable data set , 2005 .

[34]  Melanie Hilario,et al.  Knowledge and Information Systems , 2007 .

[35]  Qinghua Hu,et al.  Information-preserving hybrid data reduction based on fuzzy-rough techniques , 2006, Pattern Recognit. Lett..

[36]  Kezhi Mao,et al.  Feature selection algorithm for mixed data with both nominal and continuous features , 2007, Pattern Recognit. Lett..

[37]  Qinghua Hu,et al.  Hybrid attribute reduction based on a novel fuzzy-rough model and information granulation , 2007, Pattern Recognit..

[38]  José Ramón Villar,et al.  A Feature Selection Method Using a Fuzzy Mutual Information Measure , 2008, Innovations in Hybrid Intelligent Systems.

[39]  Jing Hua,et al.  Localized feature selection for clustering , 2008, Pattern Recognit. Lett..

[40]  Zuren Feng,et al.  An efficient ant colony optimization approach to attribute reduction in rough set theory , 2008, Pattern Recognit. Lett..

[41]  Edward R. Dougherty,et al.  The peaking phenomenon in the presence of feature-selection , 2008, Pattern Recognit. Lett..

[42]  M. Prasad,et al.  Online Feature Selection for Classifying Emphysema in HRCT Images , 2008, Int. J. Comput. Intell. Syst..

[43]  Wei-Zhi Wu,et al.  Attribute reduction based on evidence theory in incomplete decision systems , 2008, Inf. Sci..

[44]  Qinghua Hu,et al.  Stability Analysis on Rough Set Based Feature Evaluation , 2008, RSKT.

[45]  Jiye Liang,et al.  On the evaluation of the decision performance of an incomplete decision table , 2008, Data Knowl. Eng..

[46]  Jiye Liang,et al.  Consistency measure, inclusion degree and fuzzy measure in decision tables , 2008, Fuzzy Sets Syst..

[47]  Wen Li,et al.  N-grams based feature selection and text representation for Chinese Text Classification , 2009, Int. J. Comput. Intell. Syst..