论文信息 - Enhanced Bilevel Optimization via Bregman Distance

Enhanced Bilevel Optimization via Bregman Distance

Bilevel optimization has been widely applied many machine learning problems such as hyperparameter optimization, policy optimization and meta learning. Although many bilevel optimization methods more recently have been proposed to solve the bilevel optimization problems, they still suffer from high computational complexities and do not consider the more general bilevel problems with nonsmooth regularization. In the paper, thus, we propose a class of efficient bilevel optimization methods based on Bregman distance. In our methods, we use the mirror decent iteration to solve the outer subproblem of the bilevel problem by using stronglyconvex Bregman functions. Specifically, we propose a bilevel optimization method based on Bregman distance (BiO-BreD) for solving deterministic bilevel problems, which reaches the lower computational complexities than the best known results. We also propose a stochastic bilevel optimization method (SBiO-BreD) for solving stochastic bilevel problems based on the stochastic approximated gradients and Bregman distance. Further, we propose an accelerated version of SBiO-BreD method (ASBiO-BreD) by using the variance-reduced technique. Moreover, we prove that the ASBiO-BreD outperforms the best known computational complexities with respect to the condition number κ and the target accuracy ǫ for finding an ǫ-stationary point of nonconvex-strongly-convex bilevel problems. In particular, our methods can solve the bilevel optimization problems with nonsmooth regularization with a lower computational complexity.

Heng Huang | Feihu Huang

[1] Michael I. Jordan,et al. On the Adaptivity of Stochastic Gradient-Based Optimization , 2019, SIAM J. Optim..

[2] Rong Jin,et al. On Stochastic Moving-Average Estimators for Non-Convex Optimization , 2021, ArXiv.

[3] Prashant Khanduri,et al. A Near-Optimal Algorithm for Stochastic Bilevel Optimization via Double-Momentum , 2021, NeurIPS.

[4] L. Bregman. The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming , 1967 .

[5] Kaiyi Ji,et al. Lower Bounds and Accelerated Algorithms for Bilevel Optimization , 2021, ArXiv.

[6] Niao He,et al. On the Convergence Rate of Stochastic Mirror Descent for Nonsmooth Nonconvex Optimization , 2018, 1806.04781.

[7] Saeed Ghadimi,et al. Approximation Methods for Bilevel Programming , 2018, 1802.02246.

[8] Feihu Huang,et al. SUPER-ADAM: Faster and Universal Framework of Adaptive Gradients , 2021, ArXiv.

[9] Tong Zhang,et al. SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path Integrated Differential Estimator , 2018, NeurIPS.

[10] Ambuj Tewari,et al. Composite objective mirror descent , 2010, COLT 2010.

[11] Tianbao Yang,et al. Randomized Stochastic Variance-Reduced Methods for Stochastic Bilevel Optimization , 2021, ArXiv.