Concave programming for minimizing the zero-norm over polyhedral sets

Given a non empty polyhedral set, we consider the problem of finding a vector belonging to it and having the minimum number of nonzero components, i.e., a feasible vector with minimum zero-norm. This combinatorial optimization problem is NP-Hard and arises in various fields such as machine learning, pattern recognition, signal processing. One of the contributions of this paper is to propose two new smooth approximations of the zero-norm function, where the approximating functions are separable and concave. In this paper we first formally prove the equivalence between the approximating problems and the original nonsmooth problem. To this aim, we preliminarily state in a general setting theoretical conditions sufficient to guarantee the equivalence between pairs of problems. Moreover we also define an effective and efficient version of the Frank-Wolfe algorithm for the minimization of concave separable functions over polyhedral sets in which variables which are null at an iteration are eliminated for all the following ones, with significant savings in computational time, and we prove the global convergence of the method. Finally, we report the numerical results on test problems showing both the usefulness of the new concave formulations and the efficiency in terms of computational time of the implemented minimization algorithm.

[1]  Jude W. Shavlik,et al.  Machine Learning: Proceedings of the Fifteenth International Conference , 1998 .

[2]  Rémi Gribonval,et al.  Sparse representations in unions of bases , 2003, IEEE Trans. Inf. Theory.

[3]  Edoardo Amaldi,et al.  On the Approximability of Minimizing Nonzero Variables or Unsatisfied Relations in Linear Systems , 1998, Theor. Comput. Sci..

[4]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[5]  Michael Elad,et al.  From Sparse Solutions of Systems of Equations to Sparse Modeling of Signals and Images , 2009, SIAM Rev..

[6]  Philip Wolfe,et al.  An algorithm for quadratic programming , 1956 .

[7]  Stefan Schäffler,et al.  Applied Mathematics and Parallel Computing: Festschrift for Klaus Ritter , 2012 .

[8]  Michael Elad,et al.  Stable recovery of sparse overcomplete representations in the presence of noise , 2006, IEEE Transactions on Information Theory.

[9]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[10]  D. Donoho,et al.  Atomic Decomposition by Basis Pursuit , 2001 .

[11]  Olvi L. Mangasarian,et al.  Machine Learning via Polyhedral Concave Minimization , 1996 .

[12]  Paul S. Bradley,et al.  Feature Selection via Concave Minimization and Support Vector Machines , 1998, ICML.

[13]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[14]  R. E. Warmack,et al.  An Algorithm for the Optimal Solution of Linear Inequalities and its Application to Pattern Recognition , 1973, IEEE Transactions on Computers.

[15]  Bernhard Schölkopf,et al.  Use of the Zero-Norm with Linear Models and Kernel Methods , 2003, J. Mach. Learn. Res..