论文信息 - Variational Optimization

Variational Optimization

We discuss a general technique that can be used to form a differentiable bound on the optima of non-differentiable or discrete objective functions. We form a unified description of these methods and consider under which circumstances the bound is concave. In particular we consider two concrete applications of the method, namely sparse learning and support vector classification. 1 Optimization by Variational Bounding We consider the general problem of function maximization, max x f(x) for vector x. When f is differentiable and x continuous, optimization methods that use gradient information are typically preferred over nongradient based approaches since they are able to take advantage of a locally optimal direction in which to search. However, in the case that f is not differentiable or x is discrete, gradient based approaches are not directly applicable. In that case, alternatives such as relaxation, coordinate-wise optimization and stochastic approaches are popular [1]. Our interest is to discuss another general class of methods that yield differentiable surrogate objectives for discrete x or non-differentiable f . The Variational Optimization (VO) approach is based on the simple bound f∗ = max x∈C f(x) ≥ 〈 f(x) 〉 p(x|θ) ≡ E(θ) (1) where 〈·〉p denotes expectation with respect to the distribution p defined over the solution space C. The parameters θ of the distribution p(x|θ) can then be adjusted to maximize the lower bound E(θ). This bound can be trivially made tight provided the distribution p(x|θ) is flexible enough to allow all its mass to be placed in the optimal state x∗ = argmaxx f(x). Under mild restrictions, the bound is differentiable, see section(1.1), and the bound is a smooth alternative objective function (see also section(4.1) on the relation to ‘smoothing’ methods). The degree of smoothness (and the deviation from the original objective) increases as the dispersion of the variational distribution increases. In section(1.2) we give sufficient conditions for the variational bound to be concave. The purpose of this paper is to demonstrate the ease with which VO can be applied and to discuss its merits as a general way to construct a smooth alternative objective. 1.1 Differentiability of the variational objective When f(x) is not differentiable, under weak conditions E(θ) can be made differentiable. The gradient of E(θ) is given by ∂E ∂θ = ∂ ∂θ ∫

David Barber | Joe Staines | D. Barber | J. Staines