Robust feature selection via simultaneous capped ℓ2-norm and ℓ2,1-norm minimization

High dimension is one of the key characters of big data. Feature selection plays a significant role in many machine learning applications dealing with high-dimensional data. To improve the robustness of feature selection, we propose a new robust feature selection method with emphasizing Simultaneous Capped ℓ2-norm loss and ℓ2,1-norm regularizer Minimization (SCM). The capped ℓ2-norm based loss function can effectively eliminate the influence of noise and outliers in regression and the ℓ2,1-norm regularization is used to select features across data sets with joint sparsity. Meanwhile, we propose an effective approach to solve the formulated minimization problem. Experimental studies on real-world data sets demonstrate the effectiveness of our method in comparison with other popular feature selection methods.

[1]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[2]  Igor Kononenko,et al.  Estimating Attributes: Analysis and Extensions of RELIEF , 1994, ECML.

[3]  Feiping Nie,et al.  Semi-Supervised Feature Selection via Insensitive Sparse Regression with Application to Video Semantic Recognition , 2018, IEEE Transactions on Knowledge and Data Engineering.

[4]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[5]  Feiping Nie,et al.  Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence Exact Top-k Feature Selection via ℓ2,0-Norm Constraint , 2022 .

[6]  Xuelong Li,et al.  Joint Embedding Learning and Sparse Regression: A Framework for Unsupervised Feature Selection , 2014, IEEE Transactions on Cybernetics.

[7]  Feiping Nie,et al.  Efficient and Robust Feature Selection via Joint ℓ2, 1-Norms Minimization , 2010, NIPS.

[8]  Nicu Sebe,et al.  Discriminating Joint Feature Analysis for Multimedia Data Understanding , 2012, IEEE Transactions on Multimedia.

[9]  Kilian Stoffel,et al.  Theoretical Comparison between the Gini Index and Information Gain Criteria , 2004, Annals of Mathematics and Artificial Intelligence.

[10]  Chris H. Q. Ding,et al.  Towards Structural Sparsity: An Explicit l2/l0 Approach , 2010, ICDM.

[11]  S. Billings,et al.  Feature Subset Selection and Ranking for Data Dimensionality Reduction , 2007 .

[12]  Larry A. Rendell,et al.  A Practical Approach to Feature Selection , 1992, ML.

[13]  Chris H. Q. Ding,et al.  Minimum redundancy feature selection from microarray gene expression data , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[14]  Ron Kohavi,et al.  Feature Selection for Knowledge Discovery and Data Mining , 1998 .

[15]  Feiping Nie,et al.  Efficient semi-supervised feature selection with noise insensitive trace ratio criterion , 2013, Neurocomputing.

[16]  Massimiliano Pontil,et al.  Multi-Task Feature Learning , 2006, NIPS.

[17]  Yi Yang,et al.  Semisupervised Feature Selection via Spline Regression for Video Semantic Recognition , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[18]  Stephen A. Billings,et al.  Feature Subset Selection and Ranking for Data Dimensionality Reduction , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  P. Langley Selection of Relevant Features in Machine Learning , 1994 .

[20]  Chris H. Q. Ding,et al.  Minimum Redundancy Feature Selection from Microarray Gene Expression Data , 2005, J. Bioinform. Comput. Biol..