$l_{2,p}$ Matrix Norm and Its Application in Feature Selection

Recently, $l_{2,1}$ matrix norm has been widely applied to many areas such as computer vision, pattern recognition, biological study and etc. As an extension of $l_1$ vector norm, the mixed $l_{2,1}$ matrix norm is often used to find jointly sparse solutions. Moreover, an efficient iterative algorithm has been designed to solve $l_{2,1}$-norm involved minimizations. Actually, computational studies have showed that $l_p$-regularization ($0<p<1$) is sparser than $l_1$-regularization, but the extension to matrix norm has been seldom considered. This paper presents a definition of mixed $l_{2,p}$ $(p\in (0, 1])$ matrix pseudo norm which is thought as both generalizations of $l_p$ vector norm to matrix and $l_{2,1}$-norm to nonconvex cases $(0<p<1)$. Fortunately, an efficient unified algorithm is proposed to solve the induced $l_{2,p}$-norm $(p\in (0, 1])$ optimization problems. The convergence can also be uniformly demonstrated for all $p\in (0, 1]$. Typical $p\in (0,1]$ are applied to select features in computational biology and the experimental results show that some choices of $0<p<1$ do improve the sparse pattern of using $p=1$.

[1]  Jieping Ye,et al.  Multi-Task Feature Learning Via Efficient l2, 1-Norm Minimization , 2009, UAI.

[2]  Qi Tian,et al.  Correlated attribute transfer with multi-task graph-guided fusion , 2012, ACM Multimedia.

[3]  Hong Yan,et al.  ℓ2;1-norm based Regression for Classification , 2011, The First Asian Conference on Pattern Recognition.

[4]  Hongyuan Zha,et al.  {\it R}$_{\mbox{1}}$-PCA: rotational invariant {\it L}$_{\mbox{1}}$-norm principal component analysis for robust subspace factorization , 2006, ICML 2006.

[5]  Jiayu Zhou,et al.  Integrating low-rank and group-sparse structures for robust multi-task learning , 2011, KDD.

[6]  Lei Wang,et al.  Efficient Spectral Feature Selection with Minimum Redundancy , 2010, AAAI.

[7]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[8]  Soon-Yi Wu,et al.  A proximal alternating direction method for $\ell_{2,1}$-norm least squares problem in multi-taskfeature learning , 2012 .

[9]  Meng Wang,et al.  Robust Non-negative Graph Embedding: Towards noisy data, unreliable graphs, and noisy labels , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[11]  Wotao Yin,et al.  Iteratively reweighted algorithms for compressive sensing , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[12]  Peng Zhao,et al.  Stagewise Lasso , 2007, J. Mach. Learn. Res..

[13]  Zi Huang,et al.  Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence ℓ2,1-Norm Regularized Discriminative Feature Selection for Unsupervised Learning , 2022 .

[14]  S. Dudoit,et al.  Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[15]  Feiping Nie,et al.  Robust and discriminative distance for Multi-Instance Learning , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Zi Huang,et al.  Tag localization with spatial correlations and joint group sparsity , 2011, CVPR 2011.

[17]  Nicu Sebe,et al.  Exploiting the entire feature space with sparsity for automatic image annotation , 2011, ACM Multimedia.

[18]  P. Bühlmann,et al.  The group lasso for logistic regression , 2008 .

[19]  Feiping Nie,et al.  Multi-Class L2,1-Norm Support Vector Machine , 2011, 2011 IEEE 11th International Conference on Data Mining.

[20]  Paul S. Bradley,et al.  Feature Selection via Concave Minimization and Support Vector Machines , 1998, ICML.

[21]  Feiping Nie,et al.  Efficient and Robust Feature Selection via Joint ℓ2, 1-Norms Minimization , 2010, NIPS.

[22]  Jing Liu,et al.  Unsupervised Feature Selection Using Nonnegative Spectral Analysis , 2012, AAAI.

[23]  Michael I. Jordan,et al.  Multi-task feature selection , 2006 .

[24]  Y. Ye,et al.  Lower Bound Theory of Nonzero Entries in Solutions of ℓ2-ℓp Minimization , 2010, SIAM J. Sci. Comput..

[25]  S. Ramaswamy,et al.  Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. , 2002, Cancer research.

[26]  E. Lander,et al.  Gene expression correlates of clinical prostate cancer behavior. , 2002, Cancer cell.

[27]  Stéphane Canu,et al.  $\ell_{p}-\ell_{q}$ Penalty for Sparse Linear and Sparse Multiple Kernel Multitask Learning , 2011, IEEE Transactions on Neural Networks.

[28]  Rick Chartrand,et al.  Exact Reconstruction of Sparse Signals via Nonconvex Minimization , 2007, IEEE Signal Processing Letters.

[29]  Li Wang,et al.  Hybrid huberized support vector machines for microarray classification , 2007, ICML '07.