Projection-Free Algorithm for Stochastic Bi-level Optimization

This work presents the first projection-free algorithm to solve stochastic bi-level optimization problems, where the objective function depends on the solution of another stochastic optimization problem. The proposed Stochastic Bi-level Frank-Wolfe (SBFW) algorithm can be applied to streaming settings and does not make use of large batches or checkpoints. The sample complexity of SBFW is shown to be O( −3) for convex objectives and O( −4) for nonconvex objectives. Improved rates are derived for the stochastic compositional problem, which is a special case of the bi-level problem, and entails minimizing the composition of two expected-value functions. The proposed Stochastic Compositional Frank-Wolfe (SCFW) is shown to achieve a sample complexity of O( −2) for convex objectives and O( −3) for non-convex objectives, at par with the state-of-the-art sample complexities for projection-free algorithms solving single-level problems. We demonstrate the advantage of the proposed methods by solving the problem of matrix completion with denoising and the problem of policy value evaluation in reinforcement learning.

[1]  Michael T. McCann,et al.  Supervised Learning of Sparsity-Promoting Regularizers for Denoising , 2020, ArXiv.

[2]  Junyu Zhang,et al.  A Stochastic Composite Gradient Method with Incremental Variance Reduction , 2019, NeurIPS.

[3]  Haipeng Luo,et al.  Variance-Reduced and Projection-Free Stochastic Optimization , 2016, ICML.

[4]  Yingbin Liang,et al.  Bilevel Optimization: Nonasymptotic Analysis and Faster Algorithms. , 2020 .

[5]  Qiang Yang,et al.  Transfer learning for collaborative filtering via a rating-matrix generative model , 2009, ICML '09.

[6]  Philip Wolfe,et al.  An algorithm for quadratic programming , 1956 .

[7]  Wotao Yin,et al.  Tighter Analysis of Alternating Stochastic Gradient Method for Stochastic Nested Problems , 2021, ArXiv.

[8]  Amin Karbasi,et al.  Stochastic Conditional Gradient Methods: From Convex Minimization to Submodular Maximization , 2018, J. Mach. Learn. Res..

[9]  Sergey Levine,et al.  Meta-Learning with Implicit Gradients , 2019, NeurIPS.

[10]  Prashant Khanduri,et al.  A Momentum-Assisted Single-Timescale Stochastic Approximation Algorithm for Bilevel Optimization , 2021, ArXiv.

[11]  Amin Karbasi,et al.  Stochastic Conditional Gradient++: (Non)Convex Minimization and Continuous Submodular Maximization , 2020, SIAM J. Optim..

[12]  Weinan Zhang,et al.  Bi-level Actor-Critic for Multi-agent Coordination , 2020, AAAI.

[13]  Liu Liu,et al.  Variance Reduced Methods for Non-Convex Composition Optimization , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Jing Hu,et al.  Bilevel Optimization and Machine Learning , 2008, WCCI.

[15]  Xiangru Lian,et al.  Efficient Smooth Non-Convex Stochastic Compositional Optimization via Stochastic Recursive Gradient Descent , 2019, NeurIPS.

[16]  Jeffrey A. Fessler,et al.  Motivating Bilevel Approaches To Filter Learning: A Case Study , 2021, 2021 IEEE International Conference on Image Processing (ICIP).

[17]  Andreas Krause,et al.  Coresets via Bilevel Optimization for Continual Learning and Streaming , 2020, NeurIPS.

[18]  Simon Lacoste-Julien,et al.  Convergence Rate of Frank-Wolfe for Non-Convex Objectives , 2016, ArXiv.

[19]  Feihu Huang,et al.  BiAdam: Fast Adaptive Bilevel Optimization Methods , 2021, ArXiv.

[20]  Massimiliano Pontil,et al.  On the Iteration Complexity of Hypergradient Computation , 2020, ICML.

[21]  Stochastic Compositional Gradient Descent under Compositional constraints , 2020, ArXiv.

[22]  Amin Karbasi,et al.  One Sample Stochastic Frank-Wolfe , 2019, AISTATS.

[23]  Martin Jaggi,et al.  Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization , 2013, ICML.

[24]  Aryan Mokhtari,et al.  Personalized Federated Learning with Theoretical Guarantees: A Model-Agnostic Meta-Learning Approach , 2020, NeurIPS.

[25]  Wotao Yin,et al.  Solving Stochastic Compositional Optimization is Nearly as Easy as Solving Stochastic Optimization , 2020, IEEE Transactions on Signal Processing.

[26]  A Single-Timescale Stochastic Bilevel Optimization Method , 2021, ArXiv.

[27]  Bilevel Optimization , 2020, Springer Optimization and Its Applications.

[28]  Martha White,et al.  Investigating Practical Linear Temporal Difference Learning , 2016, AAMAS.

[29]  Mengdi Wang,et al.  A stochastic compositional gradient method using Markov samples , 2016, 2016 Winter Simulation Conference (WSC).

[30]  Atsushi Nitanda,et al.  Stochastic Proximal Gradient Descent with Acceleration Techniques , 2014, NIPS.

[31]  Paolo Frasconi,et al.  Bilevel Programming for Hyperparameter Optimization and Meta-Learning , 2018, ICML.

[32]  Saeed Ghadimi,et al.  A Single Timescale Stochastic Approximation Method for Nested Stochastic Optimization , 2018, SIAM J. Optim..

[33]  Heng Huang,et al.  Enhanced Bilevel Optimization via Bregman Distance , 2021, ArXiv.

[34]  Zebang Shen,et al.  Efficient Projection-Free Online Methods with Stochastic Recursive Gradient , 2019, AAAI.

[35]  Yicong Zhou,et al.  Low-Rank Quaternion Approximation for Color Image Processing , 2020, IEEE Transactions on Image Processing.

[36]  Prateek Jain,et al.  Low-rank matrix completion using alternating minimization , 2012, STOC '13.

[37]  Heng Huang,et al.  Accelerated Method for Stochastic Composition Optimization with Nonsmooth Regularization , 2017, AAAI.

[38]  Prashant Khanduri,et al.  A Near-Optimal Algorithm for Stochastic Bilevel Optimization via Double-Momentum , 2021, NeurIPS.

[39]  Guanghui Lan,et al.  An optimal method for stochastic composite optimization , 2011, Mathematical Programming.

[40]  Kaiyi Ji,et al.  Provably Faster Algorithms for Bilevel Optimization , 2021, NeurIPS.

[41]  Zhaoran Wang,et al.  A Two-Timescale Framework for Bilevel Optimization: Complexity Analysis and Application to Actor-Critic , 2020, ArXiv.

[42]  Wenqing Hu,et al.  Stochastic Recursive Momentum Method for Non-Convex Compositional Optimization , 2020, 2006.01688.

[43]  Heinrich von Stackelberg,et al.  Stackelberg (Heinrich von) - The Theory of the Market Economy, translated from the German and with an introduction by Alan T. PEACOCK. , 1953 .

[44]  Alexander Shapiro,et al.  Stochastic Approximation approach to Stochastic Programming , 2013 .

[45]  Alexander J. Smola,et al.  Stochastic Frank-Wolfe methods for nonconvex optimization , 2016, 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[46]  Amir Beck,et al.  First-Order Methods in Optimization , 2017 .

[47]  Ji Liu,et al.  Stochastic Recursive Variance Reduction for Efficient Smooth Non-Convex Compositional Optimization , 2019, ArXiv.

[48]  Massimiliano Pontil,et al.  Multi-Task Feature Learning , 2006, NIPS.

[49]  Mengdi Wang,et al.  Stochastic compositional gradient descent: algorithms for minimizing compositions of expected-value functions , 2014, Mathematical Programming.

[50]  Francesco Orabona,et al.  Momentum-Based Variance Reduction in Non-Convex SGD , 2019, NeurIPS.

[51]  Saeed Ghadimi,et al.  Approximation Methods for Bilevel Programming , 2018, 1802.02246.

[52]  Mengdi Wang,et al.  Accelerating Stochastic Composition Optimization , 2016, NIPS.