Mutual Information-Based Unsupervised Feature Transformation for Heterogeneous Feature Subset Selection

Conventional mutual information (MI) based feature selection (FS) methods are unable to handle heterogeneous feature subset selection properly because of data format differences or estimation methods of MI between feature subset and class label. A way to solve this problem is feature transformation (FT). In this study, a novel unsupervised feature transformation (UFT) which can transform non-numerical features into numerical features is developed and tested. The UFT process is MI-based and independent of class label. MI-based FS algorithms, such as Parzen window feature selector (PWFS), minimum redundancy maximum relevance feature selection (mRMR), and normalized MI feature selection (NMIFS), can all adopt UFT for pre-processing of non-numerical features. Unlike traditional FT methods, the proposed UFT is unbiased while PWFS is utilized to its full advantage. Simulations and analyses of large-scale datasets showed that feature subset selected by the integrated method, UFT-PWFS, outperformed other FT-FS integrated methods in classification accuracy.

[1]  Lucila Ohno-Machado,et al.  A greedy algorithm for supervised discretization , 2004, J. Biomed. Informatics.

[2]  Huan Liu,et al.  Toward integrating feature selection algorithms for classification and clustering , 2005, IEEE Transactions on Knowledge and Data Engineering.

[3]  Jacek M. Zurada,et al.  Normalized Mutual Information Feature Selection , 2009, IEEE Transactions on Neural Networks.

[4]  Samuel H. Huang,et al.  Fractal-Based Intrinsic Dimension Estimation and Its Application in Dimensionality Reduction , 2012, IEEE Transactions on Knowledge and Data Engineering.

[5]  Kezhi Mao,et al.  Feature selection algorithm for mixed data with both nominal and continuous features , 2007, Pattern Recognit. Lett..

[6]  Qinghua Hu,et al.  Information-preserving hybrid data reduction based on fuzzy-rough techniques , 2006, Pattern Recognit. Lett..

[7]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[9]  Alain Rakotomamonjy,et al.  Variable Selection Using SVM-based Criteria , 2003, J. Mach. Learn. Res..

[10]  Gabriele Steidl,et al.  Combined SVM-Based Feature Selection and Classification , 2005, Machine Learning.

[11]  Olcay Taner Yildiz On the feature extraction in discrete space , 2014, Pattern Recognit..

[12]  Tommy W. S. Chow,et al.  Effective feature selection scheme using mutual information , 2005, Neurocomputing.

[13]  Chong-Ho Choi,et al.  Input Feature Selection by Mutual Information Based on Parzen Window , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Ron Kohavi,et al.  Supervised and Unsupervised Discretization of Continuous Features , 1995, ICML.

[15]  Tommy W. S. Chow,et al.  A New Feature Selection Scheme Using a Data Distribution Factor for Unsupervised Nominal Data , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[16]  Huan Liu,et al.  Consistency-based search in feature selection , 2003, Artif. Intell..

[17]  Rachid Harba,et al.  Low bias histogram-based estimation of mutual information for feature selection , 2012, Pattern Recognit. Lett..

[18]  Andrzej Skowron,et al.  Rough sets: Some extensions , 2007, Inf. Sci..

[19]  Wentian Li Mutual information functions versus correlation functions , 1990 .

[20]  Lei Liu,et al.  Feature Selection Using Mutual Information: An Experimental Study , 2008, PRICAI.

[21]  Qinghua Hu,et al.  Fuzzy Probabilistic Approximation Spaces and Their Information Measures , 2006, IEEE Trans. Fuzzy Syst..

[22]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[23]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[24]  Richard Nock,et al.  A hybrid filter/wrapper approach of feature selection using information theory , 2002, Pattern Recognit..

[25]  Samuel H. Huang,et al.  Feature selection based on inference correlation , 2011, Intell. Data Anal..

[26]  Tommy W. S. Chow,et al.  Analyzing rough set based attribute reductions by extension rule , 2014, Neurocomputing.

[27]  Huan Liu,et al.  Feature Selection via Discretization , 1997, IEEE Trans. Knowl. Data Eng..

[28]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[29]  C. A. Murthy,et al.  Unsupervised Feature Selection Using Feature Similarity , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[30]  Tommy W. S. Chow,et al.  Efficiently searching the important input variables using Bayesian discriminant , 2005, IEEE Transactions on Circuits and Systems I: Regular Papers.

[31]  Pablo A. Estévez,et al.  A review of feature selection methods based on mutual information , 2013, Neural Computing and Applications.

[32]  Ning Zhong,et al.  Using Rough Sets with Heuristics for Feature Selection , 1999, Journal of Intelligent Information Systems.

[33]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[34]  Daniel Vanderpooten,et al.  A Generalized Definition of Rough Approximations Based on Similarity , 2000, IEEE Trans. Knowl. Data Eng..

[35]  Qinghua Hu,et al.  Neighborhood rough set based heterogeneous feature subset selection , 2008, Inf. Sci..

[36]  Samuel H. Huang Dimensionality Reduction in Automatic Knowledge Acquisition: A Simple Greedy Search Approach , 2003, IEEE Trans. Knowl. Data Eng..

[37]  Sankar K. Pal,et al.  Unsupervised feature selection using a neuro-fuzzy approach , 1998, Pattern Recognit. Lett..