Information-Theoretic Lower Bounds on the Oracle Complexity of Stochastic Convex Optimization

Relative to the large literature on upper bounds on complexity of convex optimization, lesser attention has been paid to the fundamental hardn4516420ess of these problems. Given the extensive use of convex optimization in machine learning and statistics, gaining an understanding of these complexity-theoretic issues is important. In this paper, we study the complexity of stochastic convex optimization in an oracle model of computation. We introduce a new notion of discrepancy between functions, and use it to reduce problems of stochastic convex optimization to statistical parameter estimation, which can be lower bounded using information-theoretic methods. Using this approach, we improve upon known results and obtain tight minimax complexity estimates for various function classes.

[1]  Shun-ichi Amari,et al.  A Theory of Pattern Recognition , 1968 .

[2]  L. Lecam Convergence of Estimates Under Dimensionality Restrictions , 1973 .

[3]  丸山 徹 Convex Analysisの二,三の進展について , 1977 .

[4]  R. Khasminskii A Lower Bound on the Risks of Non-Parametric Estimates of Densities in the Uniform Metric , 1979 .

[5]  John Darzentas,et al.  Problem Complexity and Method Efficiency in Optimization , 1983 .

[6]  Lucien Birgé Approximation dans les espaces métriques et théorie de l'estimation , 1983 .

[7]  Leslie G. Valiant,et al.  A general lower bound on the number of examples needed for learning , 1988, COLT '88.

[8]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[9]  J. Hiriart-Urruty,et al.  Convex analysis and minimization algorithms , 1993 .

[10]  Hans Ulrich Simon,et al.  Bounds on the Number of Examples Needed for Learning Functions , 1994, SIAM J. Comput..

[11]  Dimitri P. Bertsekas,et al.  Nonlinear Programming , 1997 .

[12]  Bin Yu Assouad, Fano, and Le Cam , 1997 .

[13]  Grace L. Yang,et al.  Festschrift for Lucien Le Cam , 1997 .

[14]  Yuhong Yang,et al.  Information-theoretic determination of minimax rates of convergence , 1999 .

[15]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[16]  Jiri Matousek,et al.  Lectures on discrete geometry , 2002, Graduate texts in mathematics.

[17]  Marc Teboulle,et al.  Mirror descent and nonlinear projected subgradient methods for convex optimization , 2003, Oper. Res. Lett..

[18]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[19]  Marion Kee,et al.  Analysis , 2004, Machine Translation.

[20]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[21]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[22]  Y. Nesterov Gradient methods for minimizing composite objective function , 2007 .

[23]  Léon Bottou,et al.  The Tradeoffs of Large Scale Learning , 2007, NIPS.

[24]  Arkadi Nemirovski,et al.  EFFICIENT METHODS IN CONVEX PROGRAMMING , 2007 .

[25]  Elad Hazan,et al.  Logarithmic regret algorithms for online convex optimization , 2006, Machine Learning.

[26]  Nathan Srebro,et al.  SVM optimization: inverse dependence on training set size , 2008, ICML '08.

[27]  Lin Xiao,et al.  Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization , 2009, J. Mach. Learn. Res..

[28]  Martin J. Wainwright,et al.  Information-theoretic lower bounds on the oracle complexity of convex optimization , 2009, NIPS.

[29]  Yoram Singer,et al.  Efficient Learning using Forward-Backward Splitting , 2009, NIPS.

[30]  Y. Nesterov,et al.  Primal-dual subgradient methods for minimizing uniformly convex functions , 2010, 1401.1792.

[31]  Martin J. Wainwright,et al.  Minimax Rates of Estimation for High-Dimensional Linear Regression Over $\ell_q$ -Balls , 2009, IEEE Transactions on Information Theory.

[32]  Elad Hazan,et al.  An optimal algorithm for stochastic strongly-convex optimization , 2010, 1006.2425.

[33]  Maxim Raginsky,et al.  Information-Based Complexity, Feedback and Dynamics in Convex Programming , 2010, IEEE Transactions on Information Theory.