Weighted feature extraction with a functional data extension

Dimensionality reduction has proved to be a beneficial tool in learning problems. Two of the main advantages provided by dimensionality reduction are interpretation and generalization. Typically, dimensionality reduction is addressed in two separate ways: variable selection and feature extraction. However, in the recent years there has been a growing interest in developing combined schemes such as feature extraction with built-in feature selection. In this paper, we look at dimensionality reduction as a rank-deficient problem that embraces variable selection and feature extraction, simultaneously. From our analysis, we derive a weighting algorithm that is able to select and linearly transform variables by fixing the dimensionality of the space where a relevance criterion is evaluated. This step enforces sparseness on the resulting weights. Our main goal is dimensionality reduction for classification problems. Namely, we introduce modified versions of principal component analysis (PCA) by expectation maximization (EM) and linear regularized discriminant analysis (RDA). Finally, we propose a simple extension of WRDA that deals with functional features. In this case, observations are described by a set of functions defined over the same domain. Methods were put to test on artificial and real data sets showing high levels of generalization even for small sized training samples.

[1]  Dacheng Tao,et al.  Discriminative Locality Alignment , 2008, ECCV.

[2]  Frédéric Ferraty,et al.  Nonparametric Functional Data Analysis: Theory and Practice (Springer Series in Statistics) , 2006 .

[3]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[4]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[5]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[6]  James O. Ramsay,et al.  Functional Data Analysis , 2005 .

[7]  D. Mackay,et al.  Introduction to Gaussian processes , 1998 .

[8]  Andrew R. Webb,et al.  Statistical Pattern Recognition , 1999 .

[9]  I. Jolliffe Principal Component Analysis , 2002 .

[10]  Xuelong Li,et al.  Geometric Mean for Subspace Selection , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Lior Wolf,et al.  Feature selection for unsupervised and supervised inference: the emergence of sparsity in a weighted-based approach , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[12]  Paul S. Bradley,et al.  Feature Selection via Mathematical Programming , 1997, INFORMS J. Comput..

[13]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[14]  Huan Liu,et al.  Efficient Feature Selection via Analysis of Relevance and Redundancy , 2004, J. Mach. Learn. Res..

[15]  Adam Krzyzak,et al.  Automatic Clinical Image Segmentation Using Pathological Modelling, PCA and SVM , 2005, MLDM.

[16]  Sam T. Roweis,et al.  EM Algorithms for PCA and SPCA , 1997, NIPS.

[17]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[18]  Robert P. W. Duin,et al.  Prototype selection for dissimilarity-based classifiers , 2006, Pattern Recognit..

[19]  Alex Pentland,et al.  Face recognition using eigenfaces , 1991, Proceedings. 1991 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[20]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[21]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[22]  J. Friedman Regularized Discriminant Analysis , 1989 .

[23]  Sayan Mukherjee,et al.  Feature Selection for SVMs , 2000, NIPS.

[24]  Michael E. Tipping,et al.  Probabilistic Principal Component Analysis , 1999 .

[25]  David A. Bell,et al.  A Formalism for Relevance and Its Application in Feature Subset Selection , 2000, Machine Learning.

[26]  Tommy W. S. Chow,et al.  Enhanced feature selection models using gradient-based and point injection techniques , 2008, Neurocomputing.

[27]  Peter Gärdenfors On the logic of relevance , 1978 .