Heterogeneous feature subset selection using mutual information-based feature transformation

Conventional mutual information (MI) based feature selection (FS) methods are unable to handle heterogeneous feature subset selection properly because of data format differences or estimation methods of MI between feature subset and class label. A way to solve this problem is feature transformation (FT). In this study, a novel unsupervised feature transformation (UFT) which can transform non-numerical features into numerical features is developed and tested. The UFT process is MI-based and independent of class label. MI-based FS algorithms, such as Parzen window feature selector (PWFS), minimum redundancy maximum relevance feature selection (mRMR), and normalized MI feature selection (NMIFS), can all adopt UFT for pre-processing of non-numerical features. Unlike traditional FT methods, the proposed UFT is unbiased while PWFS is utilized to its full advantage. Simulations and analyses of large-scale datasets showed that feature subset selected by the integrated method, UFT-PWFS, outperformed other FT-FS integrated methods in classification accuracy.

[1]  Tommy W. S. Chow,et al.  Clustering Heterogeneous Data with k-Means by Mutual Information-Based Unsupervised Feature Transformation , 2015, Entropy.

[2]  Chong-Ho Choi,et al.  Input Feature Selection by Mutual Information Based on Parzen Window , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Michel Verleysen,et al.  Mutual information-based feature selection for multilabel classification , 2013, Neurocomputing.

[4]  C. A. Murthy,et al.  Unsupervised Feature Selection Using Feature Similarity , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Gabriele Steidl,et al.  Combined SVM-Based Feature Selection and Classification , 2005, Machine Learning.

[6]  Samuel H. Huang Dimensionality Reduction in Automatic Knowledge Acquisition: A Simple Greedy Search Approach , 2003, IEEE Trans. Knowl. Data Eng..

[7]  Tommy W. S. Chow,et al.  Effective feature selection scheme using mutual information , 2005, Neurocomputing.

[8]  Rachid Harba,et al.  Low bias histogram-based estimation of mutual information for feature selection , 2012, Pattern Recognit. Lett..

[9]  Tommy W. S. Chow,et al.  Estimating optimal feature subsets using efficient estimation of high-dimensional mutual information , 2005, IEEE Transactions on Neural Networks.

[10]  Tommy W. S. Chow,et al.  A New Feature Selection Scheme Using a Data Distribution Factor for Unsupervised Nominal Data , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[11]  Daniel Vanderpooten,et al.  A Generalized Definition of Rough Approximations Based on Similarity , 2000, IEEE Trans. Knowl. Data Eng..

[12]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[13]  Pablo A. Estévez,et al.  A review of feature selection methods based on mutual information , 2013, Neural Computing and Applications.

[14]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[15]  Andrzej Skowron,et al.  Rough sets: Some extensions , 2007, Inf. Sci..

[16]  Qinghua Hu,et al.  Fuzzy Probabilistic Approximation Spaces and Their Information Measures , 2006, IEEE Trans. Fuzzy Syst..

[17]  Lei Liu,et al.  Feature Selection Using Mutual Information: An Experimental Study , 2008, PRICAI.

[18]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[19]  Qinghua Hu,et al.  Neighborhood rough set based heterogeneous feature subset selection , 2008, Inf. Sci..

[20]  Huan Liu,et al.  Feature Selection via Discretization , 1997, IEEE Trans. Knowl. Data Eng..

[21]  Qinghua Hu,et al.  Information-preserving hybrid data reduction based on fuzzy-rough techniques , 2006, Pattern Recognit. Lett..

[22]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[23]  Alain Rakotomamonjy,et al.  Variable Selection Using SVM-based Criteria , 2003, J. Mach. Learn. Res..

[24]  Samuel H. Huang,et al.  Fractal-Based Intrinsic Dimension Estimation and Its Application in Dimensionality Reduction , 2012, IEEE Transactions on Knowledge and Data Engineering.

[25]  Huan Liu,et al.  Consistency-based search in feature selection , 2003, Artif. Intell..

[26]  Ning Zhong,et al.  Using Rough Sets with Heuristics for Feature Selection , 1999, Journal of Intelligent Information Systems.

[27]  Huan Liu,et al.  Toward integrating feature selection algorithms for classification and clustering , 2005, IEEE Transactions on Knowledge and Data Engineering.

[28]  Kui Zhang,et al.  Feature selection for high-dimensional machinery fault diagnosis data using multiple models and Radial Basis Function networks , 2011, Neurocomputing.

[29]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[30]  Glen D Meeden,et al.  Selecting the number of bins in a histogram: A decision theoretic approach , 1997 .

[31]  Ron Kohavi,et al.  Supervised and Unsupervised Discretization of Continuous Features , 1995, ICML.

[32]  Thong Ngee Goh,et al.  A study of mutual information based feature selection for case based reasoning in software cost estimation , 2009, Expert Syst. Appl..

[33]  Mário A. T. Figueiredo,et al.  Incremental filter and wrapper approaches for feature discretization , 2014, Neurocomputing.

[34]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Manoranjan Dash,et al.  Dimensionality reduction of unsupervised data , 1997, Proceedings Ninth IEEE International Conference on Tools with Artificial Intelligence.

[36]  Jacek M. Zurada,et al.  Normalized Mutual Information Feature Selection , 2009, IEEE Transactions on Neural Networks.

[37]  Sankar K. Pal,et al.  Unsupervised feature selection using a neuro-fuzzy approach , 1998, Pattern Recognit. Lett..

[38]  Kezhi Mao,et al.  Feature selection algorithm for mixed data with both nominal and continuous features , 2007, Pattern Recognit. Lett..

[39]  Tommy W. S. Chow,et al.  Efficiently searching the important input variables using Bayesian discriminant , 2005, IEEE Transactions on Circuits and Systems I: Regular Papers.

[40]  Tommy W. S. Chow,et al.  Analyzing rough set based attribute reductions by extension rule , 2014, Neurocomputing.

[41]  Wentian Li Mutual information functions versus correlation functions , 1990 .

[42]  Olcay Taner Yildiz On the feature extraction in discrete space , 2014, Pattern Recognit..

[43]  Richard Nock,et al.  A hybrid filter/wrapper approach of feature selection using information theory , 2002, Pattern Recognit..

[44]  Peter A. Flach,et al.  Machine Learning - The Art and Science of Algorithms that Make Sense of Data , 2012 .

[45]  Lucila Ohno-Machado,et al.  A greedy algorithm for supervised discretization , 2004, J. Biomed. Informatics.

[46]  Samuel H. Huang,et al.  Feature selection based on inference correlation , 2011, Intell. Data Anal..