论文信息 - Accelerating Quadratic Optimization with Reinforcement Learning

Accelerating Quadratic Optimization with Reinforcement Learning

First-order methods for quadratic optimization such as OSQP are widely used for large-scale machine learning and embedded optimal control, where many related problems must be rapidly solved. These methods face two persistent challenges: manual hyperparameter tuning and convergence time to high-accuracy solutions. To address these, we explore how Reinforcement Learning (RL) can learn a policy to tune parameters to accelerate convergence. In experiments with well-known QP benchmarks we find that our RL policy, RLQP, significantly outperforms state-ofthe-art QP solvers by up to 3x. RLQP generalizes surprisingly well to previously unseen problems with varying dimension and structure from different applications, including the QPLIB, Netlib LP and Maros-Mészáros problems. Code, models, and videos are available at https://berkeleyautomation.github.io/rlqp/.

[1] B. Mercier,et al. A dual algorithm for the solution of nonlinear variational problems via finite element approximation , 1976 .

[2] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.

[3] Nicholas I. M. Gould,et al. A Note on Performance Profiles for Benchmarking Software , 2016, ACM Trans. Math. Softw..

[4] Yi Wu,et al. Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[5] Yurii Nesterov,et al. Interior-point polynomial algorithms in convex programming , 1994, Siam studies in applied mathematics.

[6] Ken Goldberg,et al. Deep learning can accelerate grasp-optimized motion planning , 2020, Science Robotics.

[7] P. Wolfe. THE SIMPLEX METHOD FOR QUADRATIC PROGRAMMING , 1959 .

[8] Andrea Lodi,et al. QPLIB: a library of quadratic programming instances , 2018, Mathematical Programming Computation.

[9] Kurt Keutzer,et al. Checkmate: Breaking the Memory Wall with Optimal Tensor Rematerialization , 2019, MLSys.

[10] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.

[11] Dimitris Bertsimas,et al. The voice of optimization , 2018, Machine Learning.

[12] Yuandong Tian,et al. Learning to Perform Local Rewriting for Combinatorial Optimization , 2019, NeurIPS.

[13] G. Evans,et al. Learning to Optimize , 2008 .

[14] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.

[15] C. Mészáros,et al. A repository of convex quadratic programming problems , 1999 .

[16] Herke van Hoof,et al. Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.

[17] MahadevanSridhar,et al. Recent Advances in Hierarchical Reinforcement Learning , 2003 .

[18] Joshua Achiam,et al. On First-Order Meta-Learning Algorithms , 2018, ArXiv.

[19] B. He,et al. Alternating Direction Method with Self-Adaptive Penalty Parameters for Monotone Variational Inequalities , 2000 .

[20] Michael L. Littman,et al. Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[21] Ying Fu,et al. Tuning-free Plug-and-Play Proximal Algorithm for Inverse Imaging Problems , 2020, ICML.

[22] Le Song,et al. Learning to Branch in Mixed Integer Programming , 2016, AAAI.

[23] Jeremy Nixon,et al. Understanding and correcting pathologies in the training of learned optimizers , 2018, ICML.

[24] Frank Hutter,et al. Neural Architecture Search: A Survey , 2018, J. Mach. Learn. Res..

[25] D. Gleich. TRUST REGION METHODS , 2017 .

[26] Maria-Florina Balcan,et al. Learning to Branch , 2018, ICML.

[27] Razvan Pascanu,et al. Relational inductive biases, deep learning, and graph networks , 2018, ArXiv.

[28] Ah Chung Tsoi,et al. The Graph Neural Network Model , 2009, IEEE Transactions on Neural Networks.

[29] Wenlong Huang,et al. One Policy to Control Them All: Shared Modular Policies for Agent-Agnostic Control , 2020, ICML.

[30] Ken Goldberg,et al. GOMP: Grasp-Optimized Motion Planning for Bin Picking , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[31] Yoshua Bengio,et al. Machine Learning for Combinatorial Optimization: a Methodological Tour d'Horizon , 2018, Eur. J. Oper. Res..

[32] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[33] Stephen P. Boyd,et al. CVXGEN: a code generator for embedded convex optimization , 2011, Optimization and Engineering.

[34] Stephen P. Boyd,et al. Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[35] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[36] Le Song,et al. 2 Common Formulation for Greedy Algorithms on Graphs , 2018 .

[37] Jorge J. Moré,et al. Benchmarking optimization software with performance profiles , 2001, Math. Program..

[38] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[39] Shane Legg,et al. IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.

[40] Sridhar Mahadevan,et al. Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[41] D K Smith,et al. Numerical Optimization , 2001, J. Oper. Res. Soc..

[42] Stephen P. Boyd,et al. OSQP: an operator splitting solver for quadratic programs , 2017, 2018 UKACC 12th International Conference on Control (CONTROL).

[43] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[44] R. Glowinski,et al. Sur l'approximation, par éléments finis d'ordre un, et la résolution, par pénalisation-dualité d'une classe de problèmes de Dirichlet non linéaires , 1975 .

[45] Pieter Abbeel,et al. Finding Locally Optimal, Collision-Free Trajectories with Sequential Convex Optimization , 2013, Robotics: Science and Systems.

[46] Sergey Levine,et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.