Dual subgradient algorithms for large-scale nonsmooth learning problems

Abstract“Classical” First Order (FO) algorithms of convex optimization, such as Mirror Descent algorithm or Nesterov’s optimal algorithm of smooth convex optimization, are well known to have optimal (theoretical) complexity estimates which do not depend on the problem dimension. However, to attain the optimality, the domain of the problem should admit a “good proximal setup”. The latter essentially means that (1) the problem domain should satisfy certain geometric conditions of “favorable geometry”, and (2) the practical use of these methods is conditioned by our ability to compute at a moderate cost proximal transformation at each iteration. More often than not these two conditions are satisfied in optimization problems arising in computational learning, what explains why proximal type FO methods recently became methods of choice when solving various learning problems. Yet, they meet their limits in several important problems such as multi-task learning with large number of tasks, where the problem domain does not exhibit favorable geometry, and learning and matrix completion problems with nuclear norm constraint, when the numerical cost of computing proximal transformation becomes prohibitive in large-scale problems. We propose a novel approach to solving nonsmooth optimization problems arising in learning applications where Fenchel-type representation of the objective function is available. The approach is based on applying FO algorithms to the dual problem and using the accuracy certificates supplied by the method to recover the primal solution. While suboptimal in terms of accuracy guaranties, the proposed approach does not rely upon “good proximal setup” for the primal problem but requires the problem domain to admit a Linear Optimization oracle—the ability to efficiently maximize a linear form on the domain of the primal problem.

[1]  Philip Wolfe,et al.  An algorithm for quadratic programming , 1956 .

[2]  J. Moreau Fonctions convexes duales et points proximaux dans un espace hilbertien , 1962 .

[3]  J. Moreau Proximité et dualité dans un espace hilbertien , 1965 .

[4]  L. Bregman The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming , 1967 .

[5]  B. N. Pshenichnyi,et al.  Numerical Methods in Extremal Problems. , 1978 .

[6]  J. Dunn,et al.  Conditional gradient algorithms with open loop step size rules , 1978 .

[7]  John Darzentas,et al.  Problem Complexity and Method Efficiency in Optimization , 1983 .

[8]  Y. Nesterov A method for unconstrained convex minimization problem with the rate of convergence o(1/k^2) , 1983 .

[9]  Marc Teboulle,et al.  Convergence Analysis of a Proximal-Like Minimization Algorithm Using Bregman Functions , 1993, SIAM J. Optim..

[10]  Yurii Nesterov,et al.  New variants of bundle methods , 1995, Math. Program..

[11]  Koby Crammer,et al.  On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..

[12]  Arkadi Nemirovski,et al.  Prox-Method with Rate of Convergence O(1/t) for Variational Inequalities with Lipschitz Continuous Monotone Operators and Smooth Convex-Concave Saddle Point Problems , 2004, SIAM J. Optim..

[13]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[14]  Yurii Nesterov,et al.  Smooth minimization of non-smooth functions , 2005, Math. Program..

[15]  Arkadi Nemirovski,et al.  Non-euclidean restricted memory level method for large-scale convex optimization , 2005, Math. Program..

[16]  Yurii Nesterov,et al.  Dual extrapolation and its applications to solving variational inequalities and related problems , 2003, Math. Program..

[17]  Shimon Ullman,et al.  Uncovering shared structures in multiclass classification , 2007, ICML '07.

[18]  Yurii Nesterov,et al.  Primal-dual subgradient methods for convex problems , 2005, Math. Program..

[19]  P. L. Combettes,et al.  Dualization of Signal Recovery Problems , 2009, 0907.0436.

[20]  Uriel G. Rothblum,et al.  Accuracy Certificates for Computational Problems with Convex Structure , 2010, Math. Oper. Res..

[21]  Jianqing Fan,et al.  High Dimensional Covariance Matrix Estimation in Approximate Factor Models , 2011, Annals of statistics.

[22]  Yurii Nesterov,et al.  On first-order algorithms for l1/nuclear norm minimization , 2013, Acta Numerica.

[23]  Sophia Decker,et al.  Approximate Methods In Optimization Problems , 2016 .