Unsupervised Relevance Analysis for Feature Extraction and Selection - A Distance-based Approach for Feature Relevance

The aim of this paper is to propose a new generalized formulat ion for feature extraction based on distances from a feature relevance point of view. This is done within an unsupervised framework. To do so, it is first outlined the formal concept of feature relevance. Then, a no vel feature extraction approach is introduced. Such an approach employs the M-norm as a distance measure. It is de monstrated that under some conditions, this method can readily explain literature methods. As another c ontribution of this paper, we propose an elegant feature ranking approach for feature selection followed fr om the spectral analysis of the data variability. Also, we provide a weighted PCA scheme revealing the relationship between feature extraction and feature selection. To assess the behavior of the studied methods within a patter n recognition system, a clustering stage is carried out. Normalized mutual information is used to quantify the q uality of resultant clusters. Proposed methods reach comparable results with respect to literature method s.

[1]  Deng Cai,et al.  Unsupervised feature selection for multi-cluster data , 2010, KDD.

[2]  Yurii Nesterov,et al.  Generalized Power Method for Sparse Principal Component Analysis , 2008, J. Mach. Learn. Res..

[3]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[4]  Michel Verleysen,et al.  Nonlinear Dimensionality Reduction , 2021, Computer Vision.

[5]  Germán Castellanos-Domínguez,et al.  Unsupervised feature relevance analysis applied to improve ECG heartbeat clustering , 2012, Comput. Methods Programs Biomed..

[6]  Sun Tong,et al.  Kernel PCA and Nonlinear ASM , 2012 .

[7]  Yi Ma,et al.  Robust principal component analysis? , 2009, JACM.

[8]  Ludmila I. Kuncheva,et al.  PCA feature extraction for change detection in multidimensional unlabelled streaming data , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[9]  David Zhang,et al.  Two-stage image denoising by principal component analysis with local pixel grouping , 2010, Pattern Recognit..

[10]  Germán Castellanos-Domínguez,et al.  Relevance Analysis of Stochastic Biosignals for Identification of Pathologies , 2011, EURASIP J. Adv. Signal Process..

[11]  Themistocles M. Rassias Inner product spaces and applications , 1997 .

[12]  Lior Wolf,et al.  Combining variable selection with dimensionality reduction , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).