A new feature subset selection using bottom-up clustering

Feature subset selection and/or dimensionality reduction is an essential preprocess before performing any data mining task, especially when there are too many features in the problem space. In this paper, a clustering-based feature subset selection (CFSS) algorithm is proposed for discriminating more relevant features. In each level of agglomeration, it uses similarity measure among features to merge two most similar clusters of features. By gathering similar features into clusters and then introducing representative features of each cluster, it tries to remove some redundant features. To identify the representative features, a criterion based on mutual information is proposed. Since CFSS works in a filter manner in specifying the representatives, it is noticeably fast. As an advantage of hierarchical clustering, it does not need to determine the number of clusters in advance. In CFSS, the clustering process is repeated until all features are distributed in some clusters. However, to diffuse the features in a reasonable number of clusters, a recently proposed approach is used to obtain a suitable level for cutting the clustering tree. To assess the performance of CFSS, we have applied it on some valid UCI datasets and compared with some popular feature selection methods. The experimental results reveal the efficiency and fastness of our proposed method.

[1]  Christopher Leckie,et al.  An Evaluation of Criteria for Measuring the Quality of Clusters , 1999, IJCAI.

[2]  R Kahavi,et al.  Wrapper for feature subset selection , 1997 .

[3]  Marjan Kuchaki Rafsanjani,et al.  A Survey Of Hierarchical Clustering Algorithms , 2012 .

[4]  Eghbal G. Mansoori,et al.  Using Statistical Measures for Feature Ranking , 2013, Int. J. Pattern Recognit. Artif. Intell..

[5]  J. H. Ward Hierarchical Grouping to Optimize an Objective Function , 1963 .

[6]  Ferat Sahin,et al.  A survey on feature selection methods , 2014, Comput. Electr. Eng..

[7]  Nanded Sggsie A Survey on Clustered Feature Selection Algorithms for High Dimensional Data , 2014 .

[8]  Eghbal G. Mansoori,et al.  GACH: a grid-based algorithm for hierarchical clustering of high-dimensional data , 2014, Soft Comput..

[9]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Masashi Sugiyama,et al.  Feature Selection via L1-Penalized Squared-Loss Mutual Information , 2012, IEICE Trans. Inf. Syst..

[11]  Geoffrey J. McLachlan,et al.  Analyzing Microarray Gene Expression Data , 2004 .

[12]  Filiberto Pla,et al.  Supervised feature selection by clustering using conditional mutual information-based distances , 2010, Pattern Recognit..

[13]  James Kennedy,et al.  Particle swarm optimization , 2002, Proceedings of ICNN'95 - International Conference on Neural Networks.

[14]  Marko Robnik-Sikonja,et al.  An adaptation of Relief for attribute estimation in regression , 1997, ICML.

[15]  Juha Reunanen,et al.  Overfitting in Making Comparisons Between Variable Selection Methods , 2003, J. Mach. Learn. Res..

[16]  C. A. Murthy,et al.  Unsupervised Feature Selection Using Feature Similarity , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[18]  Yogesh R. Shepal A Fast Clustering-Based Feature Subset Selection Algorithm for High Dimensional Data , 2014 .

[19]  Yuqing Song,et al.  A unique property of single-link distance and its application in data clustering , 2011, Data Knowl. Eng..

[20]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[21]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[22]  Lior Rokach,et al.  Clustering Methods , 2005, The Data Mining and Knowledge Discovery Handbook.

[23]  N. Altman An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression , 1992 .

[24]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[25]  D. Defays,et al.  An Efficient Algorithm for a Complete Link Method , 1977, Comput. J..

[26]  Anil K. Jain,et al.  Clustering Methodologies in Exploratory Data Analysis , 1980, Adv. Comput..

[27]  Yu-Chieh Wu A top-down information theoretic word clustering algorithm for phrase recognition , 2014, Inf. Sci..

[28]  Safaai Deris,et al.  Multi-stage filtering for improving confidence level and determining dominant clusters in clustering algorithms of gene expression data , 2013, Comput. Biol. Medicine.

[29]  Josef Kittler,et al.  Floating search methods in feature selection , 1994, Pattern Recognit. Lett..

[30]  Robin Sibson,et al.  SLINK: An Optimally Efficient Algorithm for the Single-Link Cluster Method , 1973, Comput. J..

[31]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .