G-Optimal Feature Selection with Laplacian regularization

Feature selection is an important preprocessing step in many applications where the data points are of high dimension. It is designed to find the most informative feature subset to facilitate data visualization, clustering, classification, and ranking. In this paper, we consider the feature selection problem in unsupervised scenarios. Typical unsupervised feature selection algorithms include [email protected] and Laplacian Score. Both of them select the most informative features by discovering the clustering or geometrical structure in the data. However, they fail to consider the performance of some specific learning task, e.g. regression, by using the selected features. Based on Laplacian Regularized Least Squares (LapRLS) which incorporates the manifold structure into the regression model, we propose a novel feature selection approach called Laplacian G-Optimal Feature Selection (LapGOFS). It minimizes the maximum variance of the predicted value of the regression model. By using techniques from manifold learning and optimal experimental design, our proposed approach can select the most informative features which can improve the learning performance the most. Extensive experimental results over various real data sets have demonstrated the effectiveness of the proposed algorithm.

[1]  Anil K. Jain,et al.  Feature Selection: Evaluation, Application, and Small Sample Performance , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Yonghong Peng,et al.  A novel feature selection approach for biomedical data classification , 2010, J. Biomed. Informatics.

[3]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[4]  Xuelong Li,et al.  A fast feature extraction method , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[5]  Jiawei Han,et al.  Document clustering using locality preserving indexing , 2005, IEEE Transactions on Knowledge and Data Engineering.

[6]  Lior Wolf,et al.  Feature selection for unsupervised and supervised inference: the emergence of sparsity in a weighted-based approach , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[7]  Mikhail Belkin,et al.  Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples , 2006, J. Mach. Learn. Res..

[8]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[9]  Chun Chen,et al.  Personalized tag recommendation using graph-based ranking on multi-type interrelated objects , 2009, SIGIR.

[10]  Jiawei Han,et al.  Spectral regression: a unified subspace learning framework for content-based image retrieval , 2007, ACM Multimedia.

[11]  Huan Liu,et al.  Feature Selection: An Ever Evolving Frontier in Data Mining , 2010, FSDM.

[12]  Xuelong Li,et al.  Supervised Gaussian Process Latent Variable Model for Dimensionality Reduction , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[13]  Xuelong Li,et al.  L1-Norm-Based 2DPCA , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[14]  Xuelong Li,et al.  Patch Alignment for Dimensionality Reduction , 2009, IEEE Transactions on Knowledge and Data Engineering.

[15]  Deng Cai,et al.  Unsupervised feature selection for multi-cluster data , 2010, KDD.

[16]  Huan Liu,et al.  Spectral feature selection for supervised and unsupervised learning , 2007, ICML '07.

[17]  Zhigang Luo,et al.  NeNMF: An Optimal Gradient Method for Nonnegative Matrix Factorization , 2012, IEEE Transactions on Signal Processing.

[18]  Wei-Ying Ma,et al.  Organizing WWW images based on the analysis of page layout and Web link structure , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[19]  P. Laycock,et al.  Optimum Experimental Designs , 1995 .

[20]  Jun Li,et al.  Mutual information algorithms , 2010 .

[21]  Charles A. Micchelli,et al.  Maximum entropy and maximum likelihood criteria for feature selection from multivariate data , 2000, 2000 IEEE International Symposium on Circuits and Systems. Emerging Technologies for the 21st Century. Proceedings (IEEE Cat No.00CH36353).

[22]  Xuelong Li,et al.  General Tensor Discriminant Analysis and Gabor Features for Gait Recognition , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  David G. Stork,et al.  Pattern Classification , 1973 .

[24]  Xuelong Li,et al.  Geometric Mean for Subspace Selection , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Marko Robnik-Sikonja,et al.  Theoretical and Empirical Analysis of ReliefF and RReliefF , 2003, Machine Learning.

[26]  Xiaofei He,et al.  Using Graph Model for Face Analysis , 2005 .

[27]  Dacheng Tao,et al.  Bregman Divergence-Based Regularization for Transfer Subspace Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[28]  Deng Cai,et al.  Laplacian Score for Feature Selection , 2005, NIPS.

[29]  Xuelong Li,et al.  Iterative Subspace Analysis Based on Feature Line Distance , 2009, IEEE Trans. Image Process..

[30]  Lei Wang,et al.  Efficient Spectral Feature Selection with Minimum Redundancy , 2010, AAAI.

[31]  Xiaofei He,et al.  Laplacian Regularized D-Optimal Design for Active Learning and Its Application to Image Retrieval , 2010, IEEE Transactions on Image Processing.

[32]  Chun Chen,et al.  Using rich social media information for music recommendation via hypergraph model , 2011, TOMCCAP.

[33]  Dacheng Tao,et al.  Biologically Inspired Feature Manifold for Scene Classification , 2010, IEEE Transactions on Image Processing.

[34]  Gene H. Golub,et al.  Matrix computations , 1983 .

[35]  Yiu-ming Cheung,et al.  Feature Selection and Kernel Learning for Local Learning-Based Clustering , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Carla E. Brodley,et al.  Feature Subset Selection and Order Identification for Unsupervised Learning , 2000, ICML.

[37]  Xuelong Li,et al.  Fast Haar Transform Based Feature Extraction for Face Representation and Recognition , 2009, IEEE Transactions on Information Forensics and Security.

[38]  Xiaofei He,et al.  Tangent space learning and generalization , 2011 .

[39]  Huan Liu,et al.  Feature Selection with Linked Data in Social Media , 2012, SDM.

[40]  Rayner Alfred,et al.  A genetic based wrapper feature selection approach using Nearest Neighbour Distance Matrix , 2011, 2011 3rd Conference on Data Mining and Optimization (DMO).

[41]  Xin Liu,et al.  Document clustering based on non-negative matrix factorization , 2003, SIGIR.

[42]  Dacheng Tao,et al.  Evolutionary Cross-Domain Discriminative Hessian Eigenmaps , 2010, IEEE Transactions on Image Processing.

[43]  Xuelong Li,et al.  Discriminant Locally Linear Embedding With High-Order Tensor Data , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[44]  Jidong Zhao,et al.  Locality sensitive semi-supervised feature selection , 2008, Neurocomputing.

[45]  Fan Chung,et al.  Spectral Graph Theory , 1996 .

[46]  Hujun Bao,et al.  A Variance Minimization Criterion to Feature Selection Using Laplacian Regularization , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.