Unsupervised feature selection based on clustering

Feature selection plays an important part in improving the classification accuracy and the quality of clustering in many applications. Feature selection has been widely studied in supervised learning, but in unsupervised learning it is still relatively rare. In this paper, a novel definition of feature differentiation for identifying (determining) the relatively important features is presented, and a one-pass clustering-based feature selection approach is introduced. The new method with nearly linear time complexity selects the optimal subset according to the variation of the feature differentiation. Experimental results on UCI datasets show that our method, by removing the irrelevant or redundant features, can achieve promising classification and clustering results for most datasets. Compared with other traditional feature selection approaches the proposed algorithm has obtained similar or even better performance in terms of dimensionality reduction and classification accuracy.

[1]  Igor Kononenko,et al.  Estimating Attributes: Analysis and Extensions of RELIEF , 1994, ECML.

[2]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[3]  N. Ramaraj,et al.  A novel hybrid feature selection via Symmetrical Uncertainty ranking based local memetic search algorithm , 2010, Knowl. Based Syst..

[4]  Bülent Sankur,et al.  Feature selection in the independent component subspace for face recognition , 2004, Pattern Recognit. Lett..

[5]  Hui Wang,et al.  A clustering-based method for unsupervised intrusion detections , 2006, Pattern Recognit. Lett..

[6]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[7]  Abraham Kandel,et al.  Information-theoretic algorithm for feature selection , 2001, Pattern Recognit. Lett..

[8]  Ron Kohavi,et al.  Feature Selection for Knowledge Discovery and Data Mining , 1998 .

[9]  Carla E. Brodley,et al.  Feature Selection for Unsupervised Learning , 2004, J. Mach. Learn. Res..

[10]  Leandro Nunes de Castro,et al.  A Cluster-Based Feature Selection Approach , 2009, HAIS.

[11]  Loris Nanni,et al.  Cluster-based pattern discrimination: A novel technique for feature selection , 2006, Pattern Recognit. Lett..

[12]  Huan Liu,et al.  Searching for Interacting Features , 2007, IJCAI.

[13]  A.K.C. Wong,et al.  Attribute clustering for grouping, selection, and classification of gene expression data , 2005, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[14]  W. Scott Spangler,et al.  Feature Weighting in k-Means Clustering , 2003, Machine Learning.

[15]  Christophe Ambroise,et al.  Feature selection in robust clustering based on Laplace mixture , 2006, Pattern Recognit. Lett..

[16]  Michael K. Ng,et al.  Automated variable weighting in k-means type clustering , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[18]  Jing Hua,et al.  Localized feature selection for clustering , 2008, Pattern Recognit. Lett..

[19]  C. A. Murthy,et al.  Unsupervised Feature Selection Using Feature Similarity , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[20]  Yiu-ming Cheung,et al.  A new feature selection method for Gaussian mixture clustering , 2009, Pattern Recognit..

[21]  Yadong Wang,et al.  Improving fuzzy c-means clustering based on feature-weight learning , 2004, Pattern Recognit. Lett..

[22]  Miin-Shen Yang,et al.  Bootstrapping approach to feature-weight selection in fuzzy c-means algorithms with an application in color image segmentation , 2008, Pattern Recognit. Lett..

[23]  Richard Weber,et al.  A wrapper method for feature selection using Support Vector Machines , 2009, Inf. Sci..

[24]  Daoqiang Zhang,et al.  Constraint Score: A new filter method for feature selection with pairwise constraints , 2008, Pattern Recognit..

[25]  Huan Liu,et al.  Toward integrating feature selection algorithms for classification and clustering , 2005, IEEE Transactions on Knowledge and Data Engineering.

[26]  Huan Liu,et al.  Efficient Feature Selection via Analysis of Relevance and Redundancy , 2004, J. Mach. Learn. Res..