Accelerating Quadratic Optimization with Reinforcement Learning

First-order methods for quadratic optimization such as OSQP are widely used for large-scale machine learning and embedded optimal control, where many related problems must be rapidly solved. These methods face two persistent challenges: manual hyperparameter tuning and convergence time to high-accuracy solutions. To address these, we explore how Reinforcement Learning (RL) can learn a policy to tune parameters to accelerate convergence. In experiments with well-known QP benchmarks we find that our RL policy, RLQP, significantly outperforms state-ofthe-art QP solvers by up to 3x. RLQP generalizes surprisingly well to previously unseen problems with varying dimension and structure from different applications, including the QPLIB, Netlib LP and Maros-Mészáros problems. Code, models, and videos are available at https://berkeleyautomation.github.io/rlqp/.

[1]  B. Mercier,et al.  A dual algorithm for the solution of nonlinear variational problems via finite element approximation , 1976 .

[2]  Guy Lever,et al.  Deterministic Policy Gradient Algorithms , 2014, ICML.

[3]  Nicholas I. M. Gould,et al.  A Note on Performance Profiles for Benchmarking Software , 2016, ACM Trans. Math. Softw..

[4]  Yi Wu,et al.  Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[5]  Yurii Nesterov,et al.  Interior-point polynomial algorithms in convex programming , 1994, Siam studies in applied mathematics.

[6]  Ken Goldberg,et al.  Deep learning can accelerate grasp-optimized motion planning , 2020, Science Robotics.

[7]  P. Wolfe THE SIMPLEX METHOD FOR QUADRATIC PROGRAMMING , 1959 .

[8]  Andrea Lodi,et al.  QPLIB: a library of quadratic programming instances , 2018, Mathematical Programming Computation.

[9]  Kurt Keutzer,et al.  Checkmate: Breaking the Memory Wall with Optimal Tensor Rematerialization , 2019, MLSys.

[10]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[11]  Dimitris Bertsimas,et al.  The voice of optimization , 2018, Machine Learning.

[12]  Yuandong Tian,et al.  Learning to Perform Local Rewriting for Combinatorial Optimization , 2019, NeurIPS.

[13]  G. Evans,et al.  Learning to Optimize , 2008 .

[14]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[15]  C. Mészáros,et al.  A repository of convex quadratic programming problems , 1999 .

[16]  Herke van Hoof,et al.  Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.

[17]  MahadevanSridhar,et al.  Recent Advances in Hierarchical Reinforcement Learning , 2003 .

[18]  Joshua Achiam,et al.  On First-Order Meta-Learning Algorithms , 2018, ArXiv.

[19]  B. He,et al.  Alternating Direction Method with Self-Adaptive Penalty Parameters for Monotone Variational Inequalities , 2000 .

[20]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[21]  Ying Fu,et al.  Tuning-free Plug-and-Play Proximal Algorithm for Inverse Imaging Problems , 2020, ICML.

[22]  Le Song,et al.  Learning to Branch in Mixed Integer Programming , 2016, AAAI.

[23]  Jeremy Nixon,et al.  Understanding and correcting pathologies in the training of learned optimizers , 2018, ICML.

[24]  Frank Hutter,et al.  Neural Architecture Search: A Survey , 2018, J. Mach. Learn. Res..

[25]  D. Gleich TRUST REGION METHODS , 2017 .

[26]  Maria-Florina Balcan,et al.  Learning to Branch , 2018, ICML.

[27]  Razvan Pascanu,et al.  Relational inductive biases, deep learning, and graph networks , 2018, ArXiv.

[28]  Ah Chung Tsoi,et al.  The Graph Neural Network Model , 2009, IEEE Transactions on Neural Networks.

[29]  Wenlong Huang,et al.  One Policy to Control Them All: Shared Modular Policies for Agent-Agnostic Control , 2020, ICML.

[30]  Ken Goldberg,et al.  GOMP: Grasp-Optimized Motion Planning for Bin Picking , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[31]  Yoshua Bengio,et al.  Machine Learning for Combinatorial Optimization: a Methodological Tour d'Horizon , 2018, Eur. J. Oper. Res..

[32]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[33]  Stephen P. Boyd,et al.  CVXGEN: a code generator for embedded convex optimization , 2011, Optimization and Engineering.

[34]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[35]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[36]  Le Song,et al.  2 Common Formulation for Greedy Algorithms on Graphs , 2018 .

[37]  Jorge J. Moré,et al.  Benchmarking optimization software with performance profiles , 2001, Math. Program..

[38]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[39]  Shane Legg,et al.  IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.

[40]  Sridhar Mahadevan,et al.  Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[41]  D K Smith,et al.  Numerical Optimization , 2001, J. Oper. Res. Soc..

[42]  Stephen P. Boyd,et al.  OSQP: an operator splitting solver for quadratic programs , 2017, 2018 UKACC 12th International Conference on Control (CONTROL).

[43]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[44]  R. Glowinski,et al.  Sur l'approximation, par éléments finis d'ordre un, et la résolution, par pénalisation-dualité d'une classe de problèmes de Dirichlet non linéaires , 1975 .

[45]  Pieter Abbeel,et al.  Finding Locally Optimal, Collision-Free Trajectories with Sequential Convex Optimization , 2013, Robotics: Science and Systems.

[46]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.