论文信息 - Projection-Free Algorithm for Stochastic Bi-level Optimization - 字舞流文

Projection-Free Algorithm for Stochastic Bi-level Optimization

This work presents the first projection-free algorithm to solve stochastic bi-level optimization problems, where the objective function depends on the solution of another stochastic optimization problem. The proposed Stochastic Bi-level Frank-Wolfe (SBFW) algorithm can be applied to streaming settings and does not make use of large batches or checkpoints. The sample complexity of SBFW is shown to be O( −3) for convex objectives and O( −4) for nonconvex objectives. Improved rates are derived for the stochastic compositional problem, which is a special case of the bi-level problem, and entails minimizing the composition of two expected-value functions. The proposed Stochastic Compositional Frank-Wolfe (SCFW) is shown to achieve a sample complexity of O( −2) for convex objectives and O( −3) for non-convex objectives, at par with the state-of-the-art sample complexities for projection-free algorithms solving single-level problems. We demonstrate the advantage of the proposed methods by solving the problem of matrix completion with denoising and the problem of policy value evaluation in reinforcement learning.

Ketan Rajawat | Amrit Singh Bedi | Srujan Teja Thomdapu | Zeeshan Akhtar | A. S. Bedi | K. Rajawat | Zeeshan Akhtar

[1] Michael T. McCann,et al. Supervised Learning of Sparsity-Promoting Regularizers for Denoising , 2020, ArXiv.

[2] Junyu Zhang,et al. A Stochastic Composite Gradient Method with Incremental Variance Reduction , 2019, NeurIPS.

[3] Haipeng Luo,et al. Variance-Reduced and Projection-Free Stochastic Optimization , 2016, ICML.

[4] Yingbin Liang,et al. Bilevel Optimization: Nonasymptotic Analysis and Faster Algorithms. , 2020 .

[5] Qiang Yang,et al. Transfer learning for collaborative filtering via a rating-matrix generative model , 2009, ICML '09.

[6] Philip Wolfe,et al. An algorithm for quadratic programming , 1956 .

[7] Wotao Yin,et al. Tighter Analysis of Alternating Stochastic Gradient Method for Stochastic Nested Problems , 2021, ArXiv.

[8] Amin Karbasi,et al. Stochastic Conditional Gradient Methods: From Convex Minimization to Submodular Maximization , 2018, J. Mach. Learn. Res..

[9] Sergey Levine,et al. Meta-Learning with Implicit Gradients , 2019, NeurIPS.

[10] Prashant Khanduri,et al. A Momentum-Assisted Single-Timescale Stochastic Approximation Algorithm for Bilevel Optimization , 2021, ArXiv.

[11] Amin Karbasi,et al. Stochastic Conditional Gradient++: (Non)Convex Minimization and Continuous Submodular Maximization , 2020, SIAM J. Optim..

[12] Weinan Zhang,et al. Bi-level Actor-Critic for Multi-agent Coordination , 2020, AAAI.

[13] Liu Liu,et al. Variance Reduced Methods for Non-Convex Composition Optimization , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14] Jing Hu,et al. Bilevel Optimization and Machine Learning , 2008, WCCI.

[15] Xiangru Lian,et al. Efficient Smooth Non-Convex Stochastic Compositional Optimization via Stochastic Recursive Gradient Descent , 2019, NeurIPS.

[16] Jeffrey A. Fessler,et al. Motivating Bilevel Approaches To Filter Learning: A Case Study , 2021, 2021 IEEE International Conference on Image Processing (ICIP).

[17] Andreas Krause,et al. Coresets via Bilevel Optimization for Continual Learning and Streaming , 2020, NeurIPS.

[18] Simon Lacoste-Julien,et al. Convergence Rate of Frank-Wolfe for Non-Convex Objectives , 2016, ArXiv.

[19] Feihu Huang,et al. BiAdam: Fast Adaptive Bilevel Optimization Methods , 2021, ArXiv.

[20] Massimiliano Pontil,et al. On the Iteration Complexity of Hypergradient Computation , 2020, ICML.

[21] Stochastic Compositional Gradient Descent under Compositional constraints , 2020, ArXiv.

[22] Amin Karbasi,et al. One Sample Stochastic Frank-Wolfe , 2019, AISTATS.

[23] Martin Jaggi,et al. Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization , 2013, ICML.

[24] Aryan Mokhtari,et al. Personalized Federated Learning with Theoretical Guarantees: A Model-Agnostic Meta-Learning Approach , 2020, NeurIPS.

[25] Wotao Yin,et al. Solving Stochastic Compositional Optimization is Nearly as Easy as Solving Stochastic Optimization , 2020, IEEE Transactions on Signal Processing.

[26] A Single-Timescale Stochastic Bilevel Optimization Method , 2021, ArXiv.

[27] Bilevel Optimization , 2020, Springer Optimization and Its Applications.

[28] Martha White,et al. Investigating Practical Linear Temporal Difference Learning , 2016, AAMAS.

[29] Mengdi Wang,et al. A stochastic compositional gradient method using Markov samples , 2016, 2016 Winter Simulation Conference (WSC).

[30] Atsushi Nitanda,et al. Stochastic Proximal Gradient Descent with Acceleration Techniques , 2014, NIPS.

[31] Paolo Frasconi,et al. Bilevel Programming for Hyperparameter Optimization and Meta-Learning , 2018, ICML.

[32] Saeed Ghadimi,et al. A Single Timescale Stochastic Approximation Method for Nested Stochastic Optimization , 2018, SIAM J. Optim..

[33] Heng Huang,et al. Enhanced Bilevel Optimization via Bregman Distance , 2021, ArXiv.

[34] Zebang Shen,et al. Efficient Projection-Free Online Methods with Stochastic Recursive Gradient , 2019, AAAI.

[35] Yicong Zhou,et al. Low-Rank Quaternion Approximation for Color Image Processing , 2020, IEEE Transactions on Image Processing.

[36] Prateek Jain,et al. Low-rank matrix completion using alternating minimization , 2012, STOC '13.

[37] Heng Huang,et al. Accelerated Method for Stochastic Composition Optimization with Nonsmooth Regularization , 2017, AAAI.

[38] Prashant Khanduri,et al. A Near-Optimal Algorithm for Stochastic Bilevel Optimization via Double-Momentum , 2021, NeurIPS.

[39] Guanghui Lan,et al. An optimal method for stochastic composite optimization , 2011, Mathematical Programming.

[40] Kaiyi Ji,et al. Provably Faster Algorithms for Bilevel Optimization , 2021, NeurIPS.

[41] Zhaoran Wang,et al. A Two-Timescale Framework for Bilevel Optimization: Complexity Analysis and Application to Actor-Critic , 2020, ArXiv.

[42] Wenqing Hu,et al. Stochastic Recursive Momentum Method for Non-Convex Compositional Optimization , 2020, 2006.01688.

[43] Heinrich von Stackelberg,et al. Stackelberg (Heinrich von) - The Theory of the Market Economy, translated from the German and with an introduction by Alan T. PEACOCK. , 1953 .

[44] Alexander Shapiro,et al. Stochastic Approximation approach to Stochastic Programming , 2013 .

[45] Alexander J. Smola,et al. Stochastic Frank-Wolfe methods for nonconvex optimization , 2016, 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[46] Amir Beck,et al. First-Order Methods in Optimization , 2017 .

[47] Ji Liu,et al. Stochastic Recursive Variance Reduction for Efficient Smooth Non-Convex Compositional Optimization , 2019, ArXiv.

[48] Massimiliano Pontil,et al. Multi-Task Feature Learning , 2006, NIPS.

[49] Mengdi Wang,et al. Stochastic compositional gradient descent: algorithms for minimizing compositions of expected-value functions , 2014, Mathematical Programming.

[50] Francesco Orabona,et al. Momentum-Based Variance Reduction in Non-Convex SGD , 2019, NeurIPS.

[51] Saeed Ghadimi,et al. Approximation Methods for Bilevel Programming , 2018, 1802.02246.

[52] Mengdi Wang,et al. Accelerating Stochastic Composition Optimization , 2016, NIPS.