论文信息 - Direct Zero-Norm Optimization for Feature Selection

Direct Zero-Norm Optimization for Feature Selection

Zero-norm, defined as the number of non-zero elements in a vector, is an ideal quantity for feature selection. However, minimization of zero-norm is generally regarded as a combinatorially difficult optimization problem. In contrast to previous methods that usually optimize a surrogate of zero-norm, we propose a direct optimization method to achieve zero-norm for feature selection in this paper. Based on Expectation Maximization (EM), this method boils down to solving a sequence of Quadratic Programming problems and hence can be practically optimized in polynomial time. We show that the proposed optimization technique has a nice Bayesian interpretation and converges to the true zero norm asymptotically, provided that a good starting point is given. Following the scheme of our proposed zero-norm, we even show that an arbitrary-norm based Support Vector Machine can be achieved in polynomial time. A series of experiments demonstrate that our proposed EM based zero-norm outperforms other state-of-the-art methods for feature selection on biological microarray data and UCI data, in terms of both the accuracy and the learning efficiency.

Michael R. Lyu | Kaizhu Huang | Irwin King | Irwin King | Kaizhu Huang

[1] Michael E. Tipping. The Relevance Vector Machine , 1999, NIPS.

[2] Catherine Blake,et al. UCI Repository of machine learning databases , 1998 .

[3] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[4] C. N. Liu,et al. Approximating discrete probability distributions with dependence trees , 1968, IEEE Trans. Inf. Theory.

[5] Bernhard Schölkopf,et al. Use of the Zero-Norm with Linear Models and Kernel Methods , 2003, J. Mach. Learn. Res..

[6] Paul S. Bradley,et al. Feature Selection via Mathematical Programming , 1997, INFORMS J. Comput..

[7] Anil K. Jain,et al. Unsupervised selection and estimation of finite mixture models , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[8] Michael I. Jordan,et al. Convergence results for the EM approach to mixtures of experts architectures , 1995, Neural Networks.

[9] Paul S. Bradley,et al. Feature Selection via Concave Minimization and Support Vector Machines , 1998, ICML.

[10] Anil K. Jain,et al. Unsupervised Learning of Finite Mixture Models , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[11] J. Platt. Sequential Minimal Optimization : A Fast Algorithm for Training Support Vector Machines , 1998 .

[12] Lai-Wan Chan,et al. The Minimum Error Minimax Probability Machine , 2004, J. Mach. Learn. Res..

[13] Alexander J. Smola,et al. Learning with kernels , 1998 .

[14] Edoardo Amaldi,et al. On the Approximability of Minimizing Nonzero Variables or Unsatisfied Relations in Linear Systems , 1998, Theor. Comput. Sci..

[15] Alexander J. Smola,et al. Minimal Kernel Classifiers , 2002, J. Mach. Learn. Res..

[16] Michael E. Tipping,et al. Fast Marginal Likelihood Maximisation for Sparse Bayesian Models , 2003 .