Direct Sparsity Optimization Based Feature Selection for Multi-Class Classification

A novel sparsity optimization method is proposed to select features for multi-class classification problems by directly optimizing a l2,p -norm (0 < p ≤ 1) based sparsity function subject to data-fitting inequality constraints to obtain large between-class margins. The direct sparse optimization method circumvents the empirical tuning of regularization parameters in existing feature selection methods that adopt the sparsity model as a regularization term. To solve the direct sparsity optimization problem that is nonsmooth and non-convex when 0 < p < 1, we propose an efficient iterative algorithm with proved convergence by converting it to a convex and smooth optimization problem at every iteration step. The proposed algorithm has been evaluated based on publicly available datasets. The experiments have demonstrated that our algorithm could achieve feature selection performance competitive to state-of-the-art algorithms.

[1]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[2]  E. Lander,et al.  Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Bhaskar D. Rao,et al.  Sparse solutions to linear inverse problems with multiple measurement vectors , 2005, IEEE Transactions on Signal Processing.

[4]  Chris H. Q. Ding,et al.  Efficient Algorithms for Selecting Features with Arbitrary Group Constraints via Group Lasso , 2013, 2013 IEEE 13th International Conference on Data Mining.

[5]  Olvi L. Mangasarian,et al.  Exact 1-Norm Support Vector Machines Via Unconstrained Convex Differentiable Minimization , 2006, J. Mach. Learn. Res..

[6]  Michael I. Jordan,et al.  Large Margin Classifiers: Convex Loss, Low Noise, and Convergence Rates , 2003, NIPS.

[7]  Feiping Nie,et al.  Feature Selection via Joint Embedding Learning and Sparse Regression , 2011, IJCAI.

[8]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Hao Helen Zhang,et al.  Support vector machines with adaptive Lq penalty , 2007, Comput. Stat. Data Anal..

[10]  Michael I. Jordan,et al.  Multi-task feature selection , 2006 .

[11]  Feiping Nie,et al.  Exclusive Feature Learning on Arbitrary Structures via \ell_{1, 2}-norm , 2014, NIPS.

[12]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[13]  Emmanuel J. Candès,et al.  Decoding by linear programming , 2005, IEEE Transactions on Information Theory.

[14]  David D. Lewis,et al.  Feature Selection and Feature Extraction for Text Categorization , 1992, HLT.

[15]  Feiping Nie,et al.  Feature Selection at the Discrete Limit , 2014, AAAI.

[16]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[17]  Jieping Ye,et al.  Multi-Task Feature Learning Via Efficient l2, 1-Norm Minimization , 2009, UAI.

[18]  Aixia Guo,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2014 .

[19]  Gérard Govaert,et al.  An Efficient Approach to Sparse Linear Discriminant Analysis , 2012, ICML.

[20]  Paul S. Bradley,et al.  Feature Selection via Concave Minimization and Support Vector Machines , 1998, ICML.

[21]  R. Chartrand,et al.  Restricted isometry properties and nonconvex compressive sensing , 2007 .

[22]  Gavin C. Cawley,et al.  Sparse Multinomial Logistic Regression via Bayesian L1 Regularisation , 2006, NIPS.

[23]  Li Wang,et al.  Hybrid huberized support vector machines for microarray classification , 2007, ICML '07.

[24]  Lawrence Carin,et al.  Sparse multinomial logistic regression: fast algorithms and generalization bounds , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  WestonJason,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002 .

[26]  Larry A. Rendell,et al.  A Practical Approach to Feature Selection , 1992, ML.

[27]  Han Liu,et al.  Blockwise coordinate descent procedures for the multi-task lasso, with applications to neural semantic basis discovery , 2009, ICML '09.

[28]  Zi Huang,et al.  Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence ℓ2,1-Norm Regularized Discriminative Feature Selection for Unsupervised Learning , 2022 .

[29]  Feiping Nie,et al.  Efficient and Robust Feature Selection via Joint ℓ2, 1-Norms Minimization , 2010, NIPS.

[30]  Feiping Nie,et al.  Discriminative Least Squares Regression for Multiclass Classification and Feature Selection , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[31]  Yufeng Liu,et al.  Support vector machines with adaptive Lq penalty , 2007, Comput. Stat. Data Anal..

[32]  Bhaskar D. Rao,et al.  Sparse signal reconstruction from limited data using FOCUSS: a re-weighted minimum norm algorithm , 1997, IEEE Trans. Signal Process..