Robust Unsupervised Feature Selection

A new unsupervised feature selection method, i.e., Robust Unsupervised Feature Selection (RUFS), is proposed. Unlike traditional unsupervised feature selection methods, pseudo cluster labels are learned via local learning regularized robust nonnegative matrix factorization. During the label learning process, feature selection is performed simultaneously by robust joint l2,1 norms minimization. Since RUFS utilizes l2,1 norm minimization on processes of both label learning and feature learning, outliers and noise could be effectively handled and redundant or noisy features could be effectively reduced. Our method adopts the advantages of robust non-negative matrix factorization, local learning, and robust feature learning. In order to make RUFS be scalable, we design a (projected) limited-memory BFGS based iterative algorithm to efficiently solve the optimization problem of RUFS in terms of both memory consumption and computation complexity. Experimental results on different benchmark real world datasets show the promising performance of RUFS over the state-of-the-arts.

[1]  James C. Bezdek,et al.  Convergence of Alternating Optimization , 2003, Neural Parallel Sci. Comput..

[2]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[3]  Jing Liu,et al.  Unsupervised Feature Selection Using Nonnegative Spectral Analysis , 2012, AAAI.

[4]  Léon Bottou,et al.  Local Learning Algorithms , 1992, Neural Computation.

[5]  Jorge Nocedal,et al.  On the limited memory BFGS method for large scale optimization , 1989, Math. Program..

[6]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[7]  Nils J. Nilsson,et al.  Artificial Intelligence , 1974, IFIP Congress.

[8]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[9]  Meng Wang,et al.  Beyond Distance Measurement: Constructing Neighborhood Similarity for Video Annotation , 2009, IEEE Transactions on Multimedia.

[10]  R. Lathe Phd by thesis , 1988, Nature.

[11]  Jiawei Han,et al.  Joint Feature Selection and Subspace Learning , 2011, IJCAI.

[12]  Feiping Nie,et al.  Efficient and Robust Feature Selection via Joint ℓ2, 1-Norms Minimization , 2010, NIPS.

[13]  Ivor W. Tsang,et al.  Flexible Manifold Embedding: A Framework for Semi-Supervised and Unsupervised Dimension Reduction , 2010, IEEE Transactions on Image Processing.

[14]  Feiping Nie,et al.  Feature Selection via Joint Embedding Learning and Sparse Regression , 2011, IJCAI.

[15]  S. Benson A Limited Memory Variable Metri Method in Subspa es and Bound Constrained Optimization Problems , 2001 .

[16]  Lei Wang,et al.  Efficient Spectral Feature Selection with Minimum Redundancy , 2010, AAAI.

[17]  Meng Wang,et al.  Unified Video Annotation via Multigraph Learning , 2009, IEEE Transactions on Circuits and Systems for Video Technology.

[18]  Robert Tibshirani,et al.  1-norm Support Vector Machines , 2003, NIPS.

[19]  Jennifer G. Dy,et al.  From Transformation-Based Dimensionality Reduction to Feature Selection , 2010, ICML.

[20]  J. Nocedal,et al.  A Limited Memory Algorithm for Bound Constrained Optimization , 1995, SIAM J. Sci. Comput..

[21]  Deng Cai,et al.  Unsupervised feature selection for multi-cluster data , 2010, KDD.

[22]  Huan Liu,et al.  Spectral feature selection for supervised and unsupervised learning , 2007, ICML '07.

[23]  Mark A. Hall,et al.  Correlation-based Feature Selection for Machine Learning , 2003 .

[24]  Zi Huang,et al.  Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence ℓ2,1-Norm Regularized Discriminative Feature Selection for Unsupervised Learning , 2022 .

[25]  Xindong Wu,et al.  Feature selection using hierarchical feature clustering , 2011, CIKM '11.

[26]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[27]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[28]  Deng Cai,et al.  Laplacian Score for Feature Selection , 2005, NIPS.

[29]  Stephen J. Wright,et al.  Numerical Optimization , 2018, Fundamental Statistical Inference.

[30]  Alain Rakotomamonjy,et al.  Variable Selection Using SVM-based Criteria , 2003, J. Mach. Learn. Res..

[31]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[32]  Feiping Nie,et al.  Trace Ratio Criterion for Feature Selection , 2008, AAAI.

[33]  Gilbert MURAZ,et al.  L 2 英語摩擦音の知覚における高周波数帯域情報の利用 , 2012 .

[34]  Keinosuke Fukunaga,et al.  Introduction to statistical pattern recognition (2nd ed.) , 1990 .

[35]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[36]  Quanquan Gu,et al.  Local Learning Regularized Nonnegative Matrix Factorization , 2009, IJCAI.

[37]  Bernhard Schölkopf,et al.  A Local Learning Approach for Clustering , 2006, NIPS.

[38]  Chris H. Q. Ding,et al.  Robust nonnegative matrix factorization using L21-norm , 2011, CIKM '11.

[39]  R. Tibshirani,et al.  �-norm Support Vector Machines , 2003 .