Feature Selection With $\ell_{2,1-2}$ Regularization

Feature selection aims to select a subset of features from high-dimensional data according to a predefined selecting criterion. Sparse learning has been proven to be a powerful technique in feature selection. Sparse regularizer, as a key component of sparse learning, has been studied for several years. Although convex regularizers have been used in many works, there are some cases where nonconvex regularizers outperform convex regularizers. To make the process of selecting relevant features more effective, we propose a novel nonconvex sparse metric on matrices as the sparsity regularization in this paper. The new nonconvex regularizer could be written as the difference of the <inline-formula> <tex-math notation="LaTeX">$\ell _{2,1}$ </tex-math></inline-formula> norm and the Frobenius (<inline-formula> <tex-math notation="LaTeX">$\ell _{2,2}$ </tex-math></inline-formula>) norm, which is named the <inline-formula> <tex-math notation="LaTeX">$\ell _{2,1-2}$ </tex-math></inline-formula>. To find the solution of the resulting nonconvex formula, we design an iterative algorithm in the framework of ConCave–Convex Procedure (CCCP) and prove its strong global convergence. An adopted alternating direction method of multipliers is embedded to solve the sequence of convex subproblems in CCCP efficiently. Using the scaled cluster indictors of data points as pseudolabels, we also apply <inline-formula> <tex-math notation="LaTeX">$\ell _{2,1-2}$ </tex-math></inline-formula> to the unsupervised case. To the best of our knowledge, it is the first work considering nonconvex regularization for matrices in the unsupervised learning scenario. Numerical experiments are performed on real-world data sets to demonstrate the effectiveness of the proposed method.

[1]  Cun-Hui Zhang Nearly unbiased variable selection under minimax concave penalty , 2010, 1002.4734.

[2]  Feiping Nie,et al.  Robust Manifold Nonnegative Matrix Factorization , 2014, ACM Trans. Knowl. Discov. Data.

[3]  Jing Liu,et al.  Unsupervised Feature Selection Using Nonnegative Spectral Analysis , 2012, AAAI.

[4]  C. A. Murthy,et al.  Unsupervised Feature Selection Using Feature Similarity , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Lei Wang,et al.  Global and Local Structure Preservation for Feature Selection , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[6]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[7]  Marcello Sanguineti,et al.  Regularization Techniques and Suboptimal Solutions to Optimization Problems in Learning from Data , 2010, Neural Computation.

[8]  Jack Xin,et al.  Minimization of ℓ1-2 for Compressed Sensing , 2015, SIAM J. Sci. Comput..

[9]  Pingbo Pan,et al.  Multiple graph unsupervised feature selection , 2016, Signal Process..

[10]  Gert R. G. Lanckriet,et al.  On the Convergence of the Concave-Convex Procedure , 2009, NIPS.

[11]  Jane Labadin,et al.  Feature selection based on mutual information , 2015, 2015 9th International Conference on IT in Asia (CITA).

[12]  Thomas Villmann,et al.  Regularization in Matrix Relevance Learning , 2010, IEEE Transactions on Neural Networks.

[13]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[14]  Huan Liu,et al.  Chi2: feature selection and discretization of numeric attributes , 1995, Proceedings of 7th IEEE International Conference on Tools with Artificial Intelligence.

[15]  ChengXiang Zhai,et al.  Robust Unsupervised Feature Selection , 2013, IJCAI.

[16]  Feiping Nie,et al.  Feature Selection at the Discrete Limit , 2014, AAAI.

[17]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[18]  Federico Girosi,et al.  Regularization Theory, Radial Basis Functions and Networks , 1994 .

[19]  Bo Tang,et al.  Semisupervised Feature Selection Based on Relevance and Redundancy Criteria , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[20]  Marcello Sanguineti,et al.  Approximate Minimization of the Regularized Expected Error over Kernel Models , 2008, Math. Oper. Res..

[21]  Jiye Liang,et al.  Ieee Transactions on Knowledge and Data Engineering 1 a Group Incremental Approach to Feature Selection Applying Rough Set Technique , 2022 .

[22]  Roberto Battiti,et al.  Using mutual information for selecting features in supervised neural net learning , 1994, IEEE Trans. Neural Networks.

[23]  Hiroshi Motoda,et al.  Feature Selection for Knowledge Discovery and Data Mining , 1998, The Springer International Series in Engineering and Computer Science.

[24]  Zenglin Xu,et al.  Discriminative Semi-Supervised Feature Selection Via Manifold Regularization , 2009, IEEE Transactions on Neural Networks.

[25]  Yi Yang,et al.  Semisupervised Feature Selection via Spline Regression for Video Semantic Recognition , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[26]  Jack Xin,et al.  Computational Aspects of Constrained L 1-L 2 Minimization for Compressive Sensing , 2015, MCO.

[27]  Qinghua Zheng,et al.  Adaptive Unsupervised Feature Selection With Structure Regularization , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[28]  Huan Liu,et al.  Advancing feature selection research , 2010 .

[29]  Lei Wang,et al.  On Similarity Preserving Feature Selection , 2013, IEEE Transactions on Knowledge and Data Engineering.

[30]  Rong Jin,et al.  Online Feature Selection and Its Applications , 2014, IEEE Transactions on Knowledge and Data Engineering.

[31]  Jack Xin,et al.  Computing Sparse Representation in a Highly Coherent Dictionary Based on Difference of $$L_1$$L1 and $$L_2$$L2 , 2015, J. Sci. Comput..

[32]  Jack Xin,et al.  A Method for Finding Structured Sparse Solutions to Nonnegative Least Squares Problems with Applications , 2013, SIAM J. Imaging Sci..

[33]  Yi Yang,et al.  A Convex Formulation for Semi-Supervised Multi-Label Feature Selection , 2014, AAAI.

[34]  Yogesh R. Shepal A Fast Clustering-Based Feature Subset Selection Algorithm for High Dimensional Data , 2014 .

[35]  Feiping Nie,et al.  Multi-Class L2,1-Norm Support Vector Machine , 2011, 2011 IEEE 11th International Conference on Data Mining.

[36]  Paul S. Bradley,et al.  Feature Selection via Concave Minimization and Support Vector Machines , 1998, ICML.

[37]  Dacheng Tao,et al.  Feature Selection Based on Structured Sparsity: A Comprehensive Study. , 2017, IEEE transactions on neural networks and learning systems.

[38]  Tomaso A. Poggio,et al.  Regularization Networks and Support Vector Machines , 2000, Adv. Comput. Math..

[39]  Feiping Nie,et al.  Discriminative Least Squares Regression for Multiclass Classification and Feature Selection , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[40]  I. E. Yen On Convergence Rate of Concave-Convex Procedure , 2012 .

[41]  Marko Robnik-Sikonja,et al.  Theoretical and Empirical Analysis of ReliefF and RReliefF , 2003, Machine Learning.

[42]  Huan Liu,et al.  Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution , 2003, ICML.

[43]  Zongben Xu,et al.  $L_{1/2}$ Regularization: A Thresholding Representation Theory and a Fast Solver , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[44]  Nicu Sebe,et al.  Web Image Annotation Via Subspace-Sparsity Collaborated Feature Selection , 2012, IEEE Transactions on Multimedia.

[45]  Huan Liu,et al.  Semi-supervised Feature Selection via Spectral Analysis , 2007, SDM.

[46]  Huan Liu,et al.  Embedded Unsupervised Feature Selection , 2015, AAAI.

[47]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[48]  Jing Liu,et al.  Clustering-Guided Sparse Structural Learning for Unsupervised Feature Selection , 2014, IEEE Transactions on Knowledge and Data Engineering.

[49]  Deng Cai,et al.  Laplacian Score for Feature Selection , 2005, NIPS.

[50]  Alan L. Yuille,et al.  The Concave-Convex Procedure (CCCP) , 2001, NIPS.

[51]  Ronghua Shang,et al.  Non-Negative Spectral Learning and Sparse Regression-Based Dual-Graph Regularized Feature Selection , 2018, IEEE Transactions on Cybernetics.

[52]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[53]  Xuelong Li,et al.  Joint Embedding Learning and Sparse Regression: A Framework for Unsupervised Feature Selection , 2014, IEEE Transactions on Cybernetics.

[54]  Feiping Nie,et al.  Efficient and Robust Feature Selection via Joint ℓ2, 1-Norms Minimization , 2010, NIPS.

[55]  Glenn Fung,et al.  Data selection for support vector machine classifiers , 2000, KDD '00.

[56]  S. Wright THE INTERPRETATION OF POPULATION STRUCTURE BY F‐STATISTICS WITH SPECIAL REGARD TO SYSTEMS OF MATING , 1965 .

[57]  E. M. L. Beale,et al.  Nonlinear Programming: A Unified Approach. , 1970 .

[58]  Zi Huang,et al.  Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence ℓ2,1-Norm Regularized Discriminative Feature Selection for Unsupervised Learning , 2022 .

[59]  Robert R. Meyer,et al.  Sufficient Conditions for the Convergence of Monotonic Mathematical Programming Algorithms , 1976, J. Comput. Syst. Sci..

[60]  Feiping Nie,et al.  Trace Ratio Criterion for Feature Selection , 2008, AAAI.

[61]  Deng Cai,et al.  Unsupervised feature selection for multi-cluster data , 2010, KDD.

[62]  Huan Liu,et al.  Spectral feature selection for supervised and unsupervised learning , 2007, ICML '07.

[63]  I. Gibson Statistics and Data Analysis in Geology , 1976, Mineralogical Magazine.

[64]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[65]  Huan Liu,et al.  Challenges of Feature Selection for Big Data Analytics , 2016, IEEE Intelligent Systems.