Enhanced Bilevel Optimization via Bregman Distance

Bilevel optimization has been widely applied many machine learning problems such as hyperparameter optimization, policy optimization and meta learning. Although many bilevel optimization methods more recently have been proposed to solve the bilevel optimization problems, they still suffer from high computational complexities and do not consider the more general bilevel problems with nonsmooth regularization. In the paper, thus, we propose a class of efficient bilevel optimization methods based on Bregman distance. In our methods, we use the mirror decent iteration to solve the outer subproblem of the bilevel problem by using stronglyconvex Bregman functions. Specifically, we propose a bilevel optimization method based on Bregman distance (BiO-BreD) for solving deterministic bilevel problems, which reaches the lower computational complexities than the best known results. We also propose a stochastic bilevel optimization method (SBiO-BreD) for solving stochastic bilevel problems based on the stochastic approximated gradients and Bregman distance. Further, we propose an accelerated version of SBiO-BreD method (ASBiO-BreD) by using the variance-reduced technique. Moreover, we prove that the ASBiO-BreD outperforms the best known computational complexities with respect to the condition number κ and the target accuracy ǫ for finding an ǫ-stationary point of nonconvex-strongly-convex bilevel problems. In particular, our methods can solve the bilevel optimization problems with nonsmooth regularization with a lower computational complexity.

[1]  Michael I. Jordan,et al.  On the Adaptivity of Stochastic Gradient-Based Optimization , 2019, SIAM J. Optim..

[2]  Rong Jin,et al.  On Stochastic Moving-Average Estimators for Non-Convex Optimization , 2021, ArXiv.

[3]  Prashant Khanduri,et al.  A Near-Optimal Algorithm for Stochastic Bilevel Optimization via Double-Momentum , 2021, NeurIPS.

[4]  L. Bregman The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming , 1967 .

[5]  Kaiyi Ji,et al.  Lower Bounds and Accelerated Algorithms for Bilevel Optimization , 2021, ArXiv.

[6]  Niao He,et al.  On the Convergence Rate of Stochastic Mirror Descent for Nonsmooth Nonconvex Optimization , 2018, 1806.04781.

[7]  Saeed Ghadimi,et al.  Approximation Methods for Bilevel Programming , 2018, 1802.02246.

[8]  Feihu Huang,et al.  SUPER-ADAM: Faster and Universal Framework of Adaptive Gradients , 2021, ArXiv.

[9]  Tong Zhang,et al.  SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path Integrated Differential Estimator , 2018, NeurIPS.

[10]  Ambuj Tewari,et al.  Composite objective mirror descent , 2010, COLT 2010.

[11]  Tianbao Yang,et al.  Randomized Stochastic Variance-Reduced Methods for Stochastic Bilevel Optimization , 2021, ArXiv.

[12]  Francesco Orabona,et al.  Momentum-Based Variance Reduction in Non-Convex SGD , 2019, NeurIPS.

[13]  Deyu Meng,et al.  Investigating Bi-Level Optimization for Learning and Vision From a Unified Perspective: A Survey and Beyond , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Yiming Yang,et al.  DARTS: Differentiable Architecture Search , 2018, ICLR.

[15]  Paolo Frasconi,et al.  Bilevel Programming for Hyperparameter Optimization and Meta-Learning , 2018, ICML.

[16]  Yingbin Liang,et al.  SpiderBoost and Momentum: Faster Variance Reduction Algorithms , 2019, NeurIPS.

[17]  Kaiyi Ji,et al.  Bilevel Optimization: Convergence Analysis and Enhanced Design , 2020, ICML.

[18]  Kaiyi Ji,et al.  Provably Faster Algorithms for Bilevel Optimization , 2021, NeurIPS.

[19]  Zhaoran Wang,et al.  A Two-Timescale Framework for Bilevel Optimization: Complexity Analysis and Application to Actor-Critic , 2020, ArXiv.

[20]  Feihu Huang,et al.  AdaGDA: Faster Adaptive Gradient Descent Ascent Methods for Minimax Optimization , 2021, ArXiv.

[21]  Saeed Ghadimi,et al.  Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization , 2013, Mathematical Programming.

[22]  Y. Censor,et al.  Proximal minimization algorithm withD-functions , 1992 .

[23]  Marc Teboulle,et al.  Mirror descent and nonlinear projected subgradient methods for convex optimization , 2003, Oper. Res. Lett..

[24]  Y. Censor,et al.  An iterative row-action method for interval convex programming , 1981 .

[25]  Jie Liu,et al.  SARAH: A Novel Method for Machine Learning Problems Using Stochastic Recursive Gradient , 2017, ICML.

[26]  Feihu Huang,et al.  Bregman Gradient Policy Optimization , 2021, ArXiv.

[27]  A Single-Timescale Stochastic Bilevel Optimization Method , 2021, ArXiv.

[28]  Feihu Huang,et al.  BiAdam: Fast Adaptive Bilevel Optimization Methods , 2021, ArXiv.