论文信息 - Minimax Sparse Logistic Regression for Very High-Dimensional Feature Selection - 字舞流文

Minimax Sparse Logistic Regression for Very High-Dimensional Feature Selection

Because of the strong convexity and probabilistic underpinnings, logistic regression (LR) is widely used in many real-world applications. However, in many problems, such as bioinformatics, choosing a small subset of features with the most discriminative power are desirable for interpreting the prediction model, robust predictions or deeper analysis. To achieve a sparse solution with respect to input features, many sparse LR models are proposed. However, it is still challenging for them to efficiently obtain unbiased sparse solutions to very high-dimensional problems (e.g., identifying the most discriminative subset from millions of features). In this paper, we propose a new minimax sparse LR model for very high-dimensional feature selections, which can be efficiently solved by a cutting plane algorithm. To solve the resultant nonsmooth minimax subproblems, a smoothing coordinate descent method is presented. Numerical issues and convergence rate of this method are carefully studied. Experimental results on several synthetic and real-world datasets show that the proposed method can obtain better prediction accuracy with the same number of selected features and has better or competitive scalability on very high-dimensional problems compared with the baseline methods, including the l1-regularized LR.

Ivor W. Tsang | Mingkui Tan | Li Wang | I. Tsang | Mingkui Tan | Li Wang

[1] H. Cordell,et al. SNP Selection in Genome-Wide and Candidate Gene Studies via Penalized Logistic Regression , 2010, Genetic epidemiology.

[2] S. Sathiya Keerthi,et al. A Fast Dual Algorithm for Kernel Logistic Regression , 2002, 2007 International Joint Conference on Neural Networks.

[3] S. V. N. Vishwanathan,et al. T-logistic Regression , 2010, NIPS.

[4] Huan Liu,et al. Efficient Feature Selection via Analysis of Relevance and Redundancy , 2004, J. Mach. Learn. Res..

[5] Chih-Jen Lin,et al. Trust region Newton methods for large-scale logistic regression , 2007, ICML '07.

[6] Song Xu,et al. Smoothing Method for Minimax Problems , 2001, Comput. Optim. Appl..

[7] Zexuan Zhu,et al. Markov blanket-embedded genetic algorithm for gene selection , 2007, Pattern Recognit..

[8] Chong Jin Ong,et al. Feature selection via sensitivity analysis of SVM probabilistic outputs , 2008, 2008 IEEE International Conference on Systems, Man and Cybernetics.

[9] Pradeep Ravikumar,et al. Greedy Algorithms for Structurally Constrained High Dimensional Problems , 2011, NIPS.

[10] Yves Grandvalet,et al. Adaptive Scaling for Feature Selection in SVMs , 2002, NIPS.

[11] Arkadi Nemirovski,et al. Prox-Method with Rate of Convergence O(1/t) for Variational Inequalities with Lipschitz Continuous Monotone Operators and Smooth Convex-Concave Saddle Point Problems , 2004, SIAM J. Optim..

[12] Chih-Jen Lin,et al. Dual coordinate descent methods for logistic regression and maximum entropy models , 2011, Machine Learning.

[13] T. Hastie,et al. Classification of gene microarrays by penalized logistic regression. , 2004, Biostatistics.

[14] Tong Zhang,et al. Analysis of Multi-stage Convex Relaxation for Sparse Regularization , 2010, J. Mach. Learn. Res..

[15] Aixia Guo,et al. Gene Selection for Cancer Classification using Support Vector Machines , 2014 .

[16] G. Tian,et al. Statistical Applications in Genetics and Molecular Biology Sparse Logistic Regression with Lp Penalty for Biomarker Identification , 2011 .

[17] Hongliang Fei,et al. Boosting with structure information in the functional space: an application to graph classification , 2010, KDD.

[18] Naoki Abe,et al. Group Orthogonal Matching Pursuit for Logistic Regression , 2011, AISTATS.

[19] Johannes O. Royset,et al. On Solving Large-Scale Finite Minimax Problems Using Exponential Smoothing , 2011, J. Optim. Theory Appl..

[20] J. G. Liao,et al. Logistic regression for disease classification using microarray data: model selection in a large p and small n case , 2007, Bioinform..

[21] Wotao Yin,et al. A Fast Hybrid Algorithm for Large-Scale l1-Regularized Logistic Regression , 2010, J. Mach. Learn. Res..

[22] Chih-Jen Lin,et al. A Comparison of Optimization Methods and Software for Large-scale L1-regularized Linear Classification , 2010, J. Mach. Learn. Res..

[23] Anirban Dasgupta,et al. Feature selection methods for text classification , 2007, KDD '07.

[24] Chia-Hua Ho,et al. An improved GLMNET for l1-regularized logistic regression , 2011, J. Mach. Learn. Res..

[25] Stephen P. Boyd,et al. Cutting-set methods for robust convex optimization with pessimizing oracles , 2009, Optim. Methods Softw..

[26] Dean P. Foster,et al. A Risk Ratio Comparison of $l_0$ and $l_1$ Penalized Regression , 2015, 1510.06319.

[27] Stephen P. Boyd,et al. Enhancing Sparsity by Reweighted ℓ1 Minimization , 2007, 0711.1612.

[28] Ivor W. Tsang,et al. Learning Sparse SVM for Feature Selection on Very High Dimensional Datasets , 2010, ICML.

[29] M. Sion. On general minimax theorems , 1958 .

[30] Masoud Nikravesh,et al. Feature Extraction - Foundations and Applications , 2006, Feature Extraction.

[31] P. Tseng,et al. On the convergence of the coordinate descent method for convex differentiable minimization , 1992 .

[32] Chih-Jen Lin,et al. Trust Region Newton Method for Logistic Regression , 2008, J. Mach. Learn. Res..

[33] Michael T. Heath,et al. Scientific Computing: An Introductory Survey , 1996 .

[34] Manik Varma,et al. More generality in efficient multiple kernel learning , 2009, ICML '09.

[35] Chih-Jen Lin,et al. Training and Testing Low-degree Polynomial Data Mappings via Linear SVM , 2010, J. Mach. Learn. Res..

[36] Zhi-Hua Zhou,et al. Multi-Instance Multi-Label Learning with Application to Scene Classification , 2006, NIPS.

[37] Angelia Nedic,et al. Subgradient Methods for Saddle-Point Problems , 2009, J. Optimization Theory and Applications.