论文信息 - Towards ultrahigh dimensional feature selection for big data

Towards ultrahigh dimensional feature selection for big data

In this paper, we present a new adaptive feature scaling scheme for ultrahigh-dimensional feature selection on Big Data, and then reformulate it as a convex semi-infinite programming (SIP) problem. To address the SIP, we propose an efficient feature generating paradigm. Different from traditional gradient-based approaches that conduct optimization on all input features, the proposed paradigm iteratively activates a group of features, and solves a sequence of multiple kernel learning (MKL) subproblems. To further speed up the training, we propose to solve the MKL subproblems in their primal forms through a modified accelerated proximal gradient approach. Due to such optimization scheme, some efficient cache techniques are also developed. The feature generating paradigm is guaranteed to converge globally under mild conditions, and can achieve lower feature selection bias. Moreover, the proposed method can tackle two challenging tasks in feature selection: 1) group-based feature selection with complex structures, and 2) nonlinear feature selection with explicit feature mappings. Comprehensive experiments on a wide range of synthetic and real-world data sets of tens of million data points with O(1014) features demonstrate the competitive performance of the proposed method over state-of-the-art feature selection methods in terms of generalization performance and training effciency.

[1] M. Kloft,et al. l p -Norm Multiple Kernel Learning , 2011 .

[2] Julien Mairal,et al. Convex and Network Flow Optimization for Structured Sparsity , 2011, J. Mach. Learn. Res..

[3] S. V. N. Vishwanathan,et al. Multiple Kernel Learning and the SMO Algorithm , 2010, NIPS.

[4] Zenglin Xu,et al. Non-monotonic feature selection , 2009, ICML '09.

[5] Naoki Abe,et al. Group Orthogonal Matching Pursuit for Logistic Regression , 2011, AISTATS.

[6] Anirban Dasgupta,et al. Feature selection methods for text classification , 2007, KDD '07.

[7] Jianxin Wu,et al. Efficient HIK SVM Learning for Image Classification , 2012, IEEE Transactions on Image Processing.

[8] Marc Teboulle,et al. A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[9] Jieping Ye,et al. Training SVM with indefinite kernels , 2008, ICML '08.

[10] P. Tseng. Convergence of a Block Coordinate Descent Method for Nondifferentiable Minimization , 2001 .

[11] Tong Zhang,et al. Analysis of Multi-stage Convex Relaxation for Sparse Regularization , 2010, J. Mach. Learn. Res..

[12] Andrew Zisserman,et al. Efficient additive kernels via explicit feature maps , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[13] Robert Tibshirani,et al. 1-norm Support Vector Machines , 2003, NIPS.

[14] Volker Roth,et al. The Group-Lasso for generalized linear models: uniqueness of solutions and efficient algorithms , 2008, ICML '08.

[15] Shai Shalev-Shwartz,et al. Stochastic dual coordinate ascent methods for regularized loss , 2012, J. Mach. Learn. Res..

[16] Paul S. Bradley,et al. Feature Selection via Concave Minimization and Support Vector Machines , 1998, ICML.

[17] M. Sion. On general minimax theorems , 1958 .

[18] Sayan Mukherjee,et al. Choosing Multiple Parameters for Support Vector Machines , 2002, Machine Learning.

[19] Katya Scheinberg,et al. Noname manuscript No. (will be inserted by the editor) Efficient Block-coordinate Descent Algorithms for the Group Lasso , 2022 .

[20] Glenn Fung,et al. A Feature Selection Newton Method for Support Vector Machine Classification , 2004, Comput. Optim. Appl..

[21] Nello Cristianini,et al. Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[22] Dell Zhang,et al. Extracting key-substring-group features for text classification , 2006, KDD '06.

[23] Julien Mairal,et al. Proximal Methods for Hierarchical Sparse Coding , 2010, J. Mach. Learn. Res..

[24] J. E. Kelley,et al. The Cutting-Plane Method for Solving Convex Programs , 1960 .