Unsupervised Feature Selection through Fitness Proportionate Sharing Clustering

As an effective dimensionality reduction technique, feature selection is widely used in the preprocessing procedure in data mining. It is highly advocated by its superiority in mitigating the effect of noisy data and simplifying the analysis of high-dimensional data. In this paper, a novel unsupervised feature selection procedure based on a clustering algorithm is proposed to evaluate the goodness of features and select a set of useful features without losing the characteristics of the data. It consists of two steps: clustering and feature evaluation. In the clustering procedure, a novel clustering algorithm based on the fitness proportionate sharing is adopted to separate data into distinct clusters without any prior knowledge about data, which is more applicable to the analysis of unknown datasets. On the other hand, the feature evaluation procedure will use the information extracted from the clustering procedure to evaluate the usefulness of each feature and select good features. The proposed method is simulated with four other famous existing feature selection algorithms and a comparison is provided in this paper. Simulation results on both synthetic and real datasets demonstrate that the proposed procedure of feature selection can effectively evaluate the significance of features and obtain a better subset of features than other four existing algorithms.

[1]  Yi Yang,et al.  Semisupervised Feature Selection via Spline Regression for Video Semantic Recognition , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[2]  Lei Wang,et al.  On Similarity Preserving Feature Selection , 2013, IEEE Transactions on Knowledge and Data Engineering.

[3]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[4]  Mohammad Razeghi-Jahromi,et al.  A novel clustering algorithm based on fitness proportionate sharing , 2017, 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC).

[5]  J. Rodgers,et al.  Thirteen ways to look at the correlation coefficient , 1988 .

[6]  Ray R. Larson Introduction to Information Retrieval , 2010 .

[7]  A. Asuncion,et al.  UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences , 2007 .

[8]  Dana Kulic,et al.  Feature-Selected Tree-Based Classification , 2013, IEEE Transactions on Cybernetics.

[9]  Huan Liu,et al.  Semi-supervised Feature Selection via Spectral Analysis , 2007, SDM.

[10]  Deng Cai,et al.  Laplacian Score for Feature Selection , 2005, NIPS.

[11]  Liang Du,et al.  Unsupervised Feature Selection with Adaptive Structure Learning , 2015, KDD.

[12]  Yiu-ming Cheung,et al.  Feature Selection and Kernel Learning for Local Learning-Based Clustering , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Zi Huang,et al.  Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence ℓ2,1-Norm Regularized Discriminative Feature Selection for Unsupervised Learning , 2022 .

[14]  Yao Zhao,et al.  A dynamic niching clustering algorithm based on individual-connectedness and its application to color image segmentation , 2016, Pattern Recognit..

[15]  Shichao Zhang,et al.  Robust Joint Graph Sparse Coding for Unsupervised Spectral Feature Selection , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[16]  Lei Wang,et al.  Global and Local Structure Preservation for Feature Selection , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[17]  Miin-Shen Yang,et al.  A similarity-based robust clustering method , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Deng Cai,et al.  Unsupervised feature selection for multi-cluster data , 2010, KDD.

[19]  Huan Liu,et al.  Spectral feature selection for supervised and unsupervised learning , 2007, ICML '07.

[20]  Josep M. Sopena,et al.  Performing Feature Selection With Multilayer Perceptrons , 2008, IEEE Transactions on Neural Networks.

[21]  Huan Liu,et al.  Feature Selection: An Ever Evolving Frontier in Data Mining , 2010, FSDM.

[22]  Xianda Zhang,et al.  A robust dynamic niching genetic algorithm with niche migration for automatic clustering problem , 2010, Pattern Recognit..

[23]  Lior Wolf,et al.  Feature selection for unsupervised and supervised inference: the emergence of sparsity in a weighted-based approach , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[24]  José M. Peña,et al.  On the Complexity of Discrete Feature Selection for Optimal Classification , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Yi Yang,et al.  A Convex Formulation for Semi-Supervised Multi-Label Feature Selection , 2014, AAAI.