Open Problem: Fast Stochastic Exp-Concave Optimization

Stochastic exp-concave optimization is an important primitive in machine learning that captures several fundamental problems, including linear regression, logistic regression and more. The exp-concavity property allows for fast convergence rates, as compared to general stochastic optimization. However, current algorithms that attain such rates scale poorly with the dimension n and run in time O(n), even on very simple instances of the problem. The question we pose is whether it is possible to obtain fast rates for exp-concave functions using more computationally-efficient algorithms. Consider the problem of minimizing a convex function F over a convex set K ⊆ Rn where our only access to F is via a stochastic gradient oracle, that given a point x ∈ K returns a random vector ĝx for which E[ĝx] = ∇F (x). We make the following assumptions: (i) F is α-exp-concave and twice differentiable; that is, if gx = ∇F (x) and Hx = ∇2F (x) are the gradient and Hessian at some point x ∈ K, then Hx α gxg x . (ii) The gradient oracle has ‖ĝx‖2 ≤ G with probability 1 at any point x ∈ K, for some positive constant G. (iii) For concreteness, we assume the case that K = {x ∈ Rn : ‖x‖2 ≤ 1} is the Euclidean unit ball. An important special case is when F is given as an expectation F (x) = Ez∼D[f(x, z)] over an unknown distribution D of parameters z, where for every fixed parameter value z the function f(x, z) is α-exp-concave with gradients bounded by G. Indeed, this implies that F is itself α-exp-concave (see Appendix A). Given the ability to sample from the distribution D, we can implement a gradient oracle by setting ĝx = ∇f(x, z) where z ∼ D. For example, f(x, (a, b)) = 1 2(a >x − b)2 corresponds to linear regression. In a learning scenario it is reasonable to assume that f(x, (a, b)) ≤ M with probability 1 for some constant M , which also guarantees that f is exp-concave with α = 1/M . Additional examples include the log-loss f(x, a) = − log(a>x) and the logistic loss f(x, (a, b)) = log(1+exp(−b ·a>x)), both are exp-concave provided that a, b and x are properly bounded. The goal of an optimization algorithm, given a target accuracy e, is to compute a point x for which F (x)−minx∈K F (x) ≤ e (either in expectation, or with high probability). The standard approach to general stochastic optimization, namely the Stochastic Gradient Descent algorithm, computes an e-approximate solution using O(1/e2) oracle queries. Since each iteration runs in linear time1, the total runtime of this approach is O(n/e2). 1. We assume that an oracle query runs in time O(1).