∇-Prox: Differentiable Proximal Algorithm Modeling for Large-Scale Optimization

Tasks across diverse application domains can be posed as large-scale optimization problems, these include graphics, vision, machine learning, imaging, health, scheduling, planning, and energy system forecasting. Independently of the application domain, proximal algorithms have emerged as a formal optimization method that successfully solves a wide array of existing problems, often exploiting problem-specific structures in the optimization. Although model-based formal optimization provides a principled approach to problem modeling with convergence guarantees, at first glance, this seems to be at odds with black-box deep learning methods. A recent line of work shows that, when combined with learning-based ingredients, model-based optimization methods are effective, interpretable, and allow for generalization to a wide spectrum of applications with little or no extra training data. However, experimenting with such hybrid approaches for different tasks by hand requires domain expertise in both proximal optimization and deep learning, which is often error-prone and time-consuming. Moreover, naively unrolling these iterative methods produces lengthy compute graphs, which when differentiated via autograd techniques results in exploding memory consumption, making batch-based training challenging. In this work, we introduce ∇-Prox, a domain-specific modeling language and compiler for large-scale optimization problems using differentiable proximal algorithms. ∇-Prox allows users to specify optimization objective functions of unknowns concisely at a high level, and intelligently compiles the problem into compute and memory-efficient differentiable solvers. One of the core features of ∇-Prox is its full differentiability, which supports hybrid model- and learning-based solvers integrating proximal optimization with neural network pipelines. Example applications of this methodology include learning-based priors and/or sample-dependent inner-loop optimization schedulers, learned with deep equilibrium learning or deep reinforcement learning. With a few lines of code, we show ∇-Prox can generate performant solvers for a range of image optimization problems, including end-to-end computational optics, image deraining, and compressive magnetic resonance imaging. We also demonstrate ∇-Prox can be used in a completely orthogonal application domain of energy system planning, an essential task in the energy crisis and the clean energy transition, where it outperforms state-of-the-art CVXPY and commercial Gurobi solvers.

[1]  Depeng Dang,et al.  Mixed Hierarchy Network for Image Restoration , 2023, ArXiv.

[2]  Ben Poole,et al.  VeLO: Training Versatile Learned Optimizers by Scaling Up , 2022, ArXiv.

[3]  P. Härtel,et al.  IMAGINE – Market-based multi-period planning of European hydrogen and natural gas infrastructure , 2022, 2022 18th International Conference on the European Energy Market (EEM).

[4]  Ying Fu,et al.  Guided Hyperspectral Image Denoising with Realistic Data , 2022, International Journal of Computer Vision.

[5]  Ricky T. Q. Chen,et al.  Theseus: A Library for Differentiable Nonlinear Optimization , 2022, NeurIPS.

[6]  Felix Heide,et al.  Seeing through obstructions with diffractive cloaking , 2022, ACM Trans. Graph..

[7]  Michael T. Craig,et al.  Overcoming the disconnect between energy system and climate modeling , 2022, Joule.

[8]  Weijie Gan,et al.  Online Deep Equilibrium Learning for Regularization by Denoising , 2022, NeurIPS.

[9]  Jian Zhang,et al.  Deep Generalized Unfolding Networks for Image Restoration , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Raymond A. Yeh,et al.  Total Variation Optimization Layers for Computer Vision , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Wenzel Jakob,et al.  DR.JIT , 2022, ACM Trans. Graph..

[12]  Kaixuan Wei,et al.  Deep plug-and-play prior for hyperspectral image restoration , 2022, Neurocomputing.

[13]  Diana Böttger,et al.  On wholesale electricity prices and market values in a carbon-neutral energy system , 2021, Energy Economics.

[14]  Syed Waqas Zamir,et al.  Restormer: Efficient Transformer for High-Resolution Image Restoration , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Matthias Nießner,et al.  Thallo – Scheduling for High-Performance Large-Scale Non-Linear Least-Squares Solvers , 2021, ACM Trans. Graph..

[16]  Ying Fu,et al.  Physics-Based Noise Modeling for Extreme Low-Light Photography , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Francesco Borrelli,et al.  Accelerating Quadratic Optimization with Reinforcement Learning , 2021, NeurIPS.

[18]  Wenzel Jakob,et al.  Path replay backpropagation , 2021, ACM Trans. Graph..

[19]  Michael Kruse Loop Transformations using Clang’s Abstract Syntax Tree , 2021, ICPP Workshops.

[20]  Felix Heide,et al.  Supplementary Information Differentiable Compound Optics and Processing Pipeline Optimization for End-to-end Camera Design , 2021 .

[21]  Karen Egiazarian,et al.  End-to-End Learning for Joint Image Demosaicing, Denoising and Super-Resolution , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Marco Cuturi,et al.  Efficient and Modular Implicit Differentiation , 2021, NeurIPS.

[23]  W. Yin,et al.  Learning to Optimize: A Primer and A Benchmark , 2021, J. Mach. Learn. Res..

[24]  Lizhi Wang,et al.  Coded Hyperspectral Image Reconstruction Using Deep External and Internal Learning , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  R. Willett,et al.  Deep Equilibrium Architectures for Inverse Problems in Imaging , 2021, IEEE Transactions on Computational Imaging.

[26]  Ling Shao,et al.  Multi-Stage Progressive Image Restoration , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Yonina C. Eldar,et al.  Model-Based Deep Learning , 2020, Proceedings of the IEEE.

[28]  Angelica I. Avilés-Rivero,et al.  TFPnP: Tuning-free Plug-and-Play Proximal Algorithm with Applications to Inverse Imaging Problems , 2020, J. Mach. Learn. Res..

[29]  Debraj Ghosh,et al.  Modelling Heat Pump Systems in Low-Carbon Energy Systems With Significant Cross-Sectoral Integration , 2020, IEEE Transactions on Power Systems.

[30]  Luc Van Gool,et al.  Plug-and-Play Image Restoration With Deep Denoiser Prior , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Wolfgang Heidrich,et al.  Learning Rank-1 Diffractive Optics for Single-Shot High Dynamic Range Imaging , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Brendt Wohlberg,et al.  Provable Convergence of Plug-and-Play Priors With MMSE Denoisers , 2020, IEEE Signal Processing Letters.

[33]  Alexandros G. Dimakis,et al.  Deep Learning Techniques for Inverse Problems in Imaging , 2020, IEEE Journal on Selected Areas in Information Theory.

[34]  U. Rajendra Acharya,et al.  Automated detection of COVID-19 cases using deep neural networks with X-ray images , 2020, Computers in Biology and Medicine.

[35]  Brendan O'Donoghue Operator Splitting for a Homogeneous Embedding of the Linear Complementarity Problem , 2020, SIAM J. Optim..

[36]  Robert E. Bixby,et al.  Presolve Reductions in Mixed Integer Programming , 2020, INFORMS J. Comput..

[37]  Angelica I. Avilés-Rivero,et al.  Tuning-free Plug-and-Play Proximal Algorithm for Inverse Imaging Problems , 2020, ICML.

[38]  Wangmeng Zuo,et al.  Deep Learning on Image Denoising: An overview , 2019, Neural Networks.

[39]  Yonina C. Eldar,et al.  Algorithm Unrolling: Interpretable, Efficient Deep Learning for Signal and Image Processing , 2019, IEEE Signal Processing Magazine.

[40]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[41]  Frédo Durand,et al.  Taichi , 2019, ACM Trans. Graph..

[42]  Yifan Peng,et al.  Learned large field-of-view imaging with thin-plate optics , 2019, ACM Trans. Graph..

[43]  Stephen P. Boyd,et al.  Differentiable Convex Optimization Layers , 2019, NeurIPS.

[44]  Alexei A. Efros,et al.  Test-Time Training with Self-Supervision for Generalization under Distribution Shifts , 2019, ICML.

[45]  Jonathan Ragan-Kelley,et al.  DiffTaichi: Differentiable Programming for Physical Simulation , 2019, ICLR.

[46]  J. Z. Kolter,et al.  Deep Equilibrium Models , 2019, NeurIPS.

[47]  Yifan Peng,et al.  Deep Optics for Single-Shot High-Dynamic-Range Imaging , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Frédo Durand,et al.  Learning to optimize halide with tree search and random programs , 2019, ACM Trans. Graph..

[49]  Stephen P. Boyd,et al.  Differentiating through a cone program , 2019, Journal of Applied and Numerical Optimization.

[50]  Konstantinos D. Tsirigos,et al.  SignalP 5.0 improves signal peptide predictions using deep neural networks , 2019, Nature Biotechnology.

[51]  Qinghua Hu,et al.  Progressive Image Deraining Networks: A Better and Simpler Baseline , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Kalyan Sunkavalli,et al.  Learning to reconstruct shape and spatially-varying reflectance from a single image , 2018, ACM Trans. Graph..

[53]  Peter Henderson,et al.  An Introduction to Deep Reinforcement Learning , 2018, Found. Trends Mach. Learn..

[54]  Stephen P. Boyd,et al.  End-to-end optimization of optics and image processing for achromatic extended depth of field and super-resolution imaging , 2018, ACM Trans. Graph..

[55]  Frédo Durand,et al.  Differentiable programming for image processing and deep learning in halide , 2018, ACM Trans. Graph..

[56]  Xindong Wu,et al.  Object Detection With Deep Learning: A Review , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[57]  Raja Giryes,et al.  Depth Estimation From a Single Image Using Deep Learned Phase Coded Mask , 2018, IEEE Transactions on Computational Imaging.

[58]  Guangming Shi,et al.  Denoising Prior Driven Deep Neural Network for Image Restoration , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[59]  Jonathan Ragan-Kelley,et al.  Halide , 2017 .

[60]  Stephen P. Boyd,et al.  OSQP: an operator splitting solver for quadratic programs , 2017, 2018 UKACC 12th International Conference on Control (CONTROL).

[61]  Frank Hutter,et al.  Decoupled Weight Decay Regularization , 2017, ICLR.

[62]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[63]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[64]  Lei Zhang,et al.  FFDNet: Toward a Fast and Flexible Solution for CNN-Based Image Denoising , 2017, IEEE Transactions on Image Processing.

[65]  Roarke Horstmeyer,et al.  Convolutional neural networks that teach microscopes how to image , 2017, ArXiv.

[66]  Eirikur Agustsson,et al.  NTIRE 2017 Challenge on Single Image Super-Resolution: Dataset and Study , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[67]  Delu Zeng,et al.  Removing Rain from Single Images via a Deep Detail Network , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[68]  Bernard Ghanem,et al.  ISTA-Net: Interpretable Optimization-Inspired Deep Network for Image Compressive Sensing , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[69]  Gordon Wetzstein,et al.  Unrolled Optimization with Deep Priors , 2017, ArXiv.

[70]  Pramod K. Varshney,et al.  Convergence Analysis of Proximal Gradient with Momentum for Nonconvex Optimization , 2017, ICML.

[71]  Yuantao Gu,et al.  Linearized ADMM for Nonconvex Nonsmooth Optimization With Convergence Analysis , 2017, IEEE Access.

[72]  Wangmeng Zuo,et al.  Learning Deep CNN Denoiser Prior for Image Restoration , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[73]  Misha Denil,et al.  Learned Optimizers that Scale and Generalize , 2017, ICML.

[74]  J. Zico Kolter,et al.  OptNet: Differentiable Optimization as a Layer in Neural Networks , 2017, ICML.

[75]  Vishal M. Patel,et al.  Image De-Raining Using a Conditional Generative Adversarial Network , 2017, IEEE Transactions on Circuits and Systems for Video Technology.

[76]  Jian Sun,et al.  Deep ADMM-Net for Compressive Sensing MRI , 2016, NIPS.

[77]  Ender M. Eksioglu,et al.  Decoupled Algorithm for MRI Reconstruction Using Nonlocal Block Matching Model: BM3D-MRI , 2016, Journal of Mathematical Imaging and Vision.

[78]  Boaz Arad,et al.  Sparse Recovery of Hyperspectral Signal from Natural RGB Images , 2016, ECCV.

[79]  Sören Laue,et al.  Distributed Convex Optimization with Many Convex Constraints , 2016, ArXiv.

[80]  Shuicheng Yan,et al.  Deep Joint Rain Detection and Removal from a Single Image , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[81]  Lei Zhang,et al.  Beyond a Gaussian Denoiser: Residual Learning of Deep CNN for Image Denoising , 2016, IEEE Transactions on Image Processing.

[82]  Gordon Wetzstein,et al.  ProxImaL , 2016, ACM Trans. Graph..

[83]  Jonathan Ragan-Kelley,et al.  Automatically scheduling halide image processing pipelines , 2016, ACM Trans. Graph..

[84]  Marcin Andrychowicz,et al.  Learning to learn by gradient descent by gradient descent , 2016, NIPS.

[85]  Wojciech Matusik,et al.  Simit , 2016, ACM Trans. Graph..

[86]  Stanley H. Chan,et al.  Plug-and-Play ADMM for Image Restoration: Fixed-Point Convergence and Applications , 2016, IEEE Transactions on Computational Imaging.

[87]  Antonin Chambolle,et al.  An introduction to continuous optimization for imaging , 2016, Acta Numerica.

[88]  Matthias Nießner,et al.  Opt , 2016, ACM Trans. Graph..

[89]  Tianqi Chen,et al.  Training Deep Nets with Sublinear Memory Cost , 2016, ArXiv.

[90]  E. M. Eksioglu Decoupled Algorithm for MRI Reconstruction Using Nonlocal Block Matching Model: BM3D-MRI , 2016, Journal of Mathematical Imaging and Vision.

[91]  Stephen P. Boyd,et al.  CVXPY: A Python-Embedded Modeling Language for Convex Optimization , 2016, J. Mach. Learn. Res..

[92]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[93]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[94]  Stephen P. Boyd,et al.  Convex Optimization with Abstract Linear Operators , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[95]  Wotao Yin,et al.  Global Convergence of ADMM in Nonconvex Nonsmooth Optimization , 2015, Journal of Scientific Computing.

[96]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[97]  Carola-Bibiane Schönlieb,et al.  Preconditioned ADMM with Nonlinear Operator Constraint , 2015, System Modelling and Optimization.

[98]  Philip Levis,et al.  Ebb: A DSL for Physical Simluation on CPUs and GPUs , 2015, ACM Trans. Graph..

[99]  Narendra Ahuja,et al.  Single image super-resolution from transformed self-exemplars , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[100]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[101]  Qi Huangfu,et al.  Parallelizing the dual revised simplex method , 2015, Mathematical Programming Computation.

[102]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[103]  Barak A. Pearlmutter,et al.  Automatic differentiation in machine learning: a survey , 2015, J. Mach. Learn. Res..

[104]  Michael I. Jordan,et al.  A General Analysis of the Convergence of ADMM , 2015, ICML.

[105]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[106]  Kari Pulli,et al.  FlexISP , 2014, ACM Trans. Graph..

[107]  Stephen P. Boyd,et al.  Convex Optimization in Julia , 2014, 2014 First Workshop for High Performance Technical Computing in Dynamic Languages.

[108]  Jonathan Le Roux,et al.  Deep Unfolding: Model-Based Inspiration of Novel Deep Architectures , 2014, ArXiv.

[109]  Pat Hanrahan,et al.  Darkroom , 2014, ACM Trans. Graph..

[110]  Michael Möller,et al.  The Primal-Dual Hybrid Gradient Method for Semiconvex Splittings , 2014, SIAM J. Imaging Sci..

[111]  Guoyin Li,et al.  Global Convergence of Splitting Methods for Nonconvex Composite Optimization , 2014, SIAM J. Optim..

[112]  Luc Van Gool,et al.  The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.

[113]  Guy Lever,et al.  Deterministic Policy Gradient Algorithms , 2014, ICML.

[114]  J. Pesquet,et al.  Playing with Duality: An overview of recent primal?dual approaches for solving large-scale optimization problems , 2014, IEEE Signal Processing Magazine.

[115]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[116]  Brendt Wohlberg,et al.  Plug-and-Play priors for model based reconstruction , 2013, 2013 IEEE Global Conference on Signal and Information Processing.

[117]  Tae Hyun Kim,et al.  Dynamic Scene Deblurring , 2013, 2013 IEEE International Conference on Computer Vision.

[118]  Stephen P. Boyd,et al.  Proximal Algorithms , 2013, Found. Trends Optim..

[119]  Frédo Durand,et al.  Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines , 2013, PLDI.

[120]  Benar Fux Svaiter,et al.  Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward–backward splitting, and regularized Gauss–Seidel methods , 2013, Math. Program..

[121]  Yair Weiss,et al.  From learning models of natural image patches to whole image restoration , 2011, 2011 International Conference on Computer Vision.

[122]  Homer F. Walker,et al.  Anderson Acceleration for Fixed-Point Iterations , 2011, SIAM J. Numer. Anal..

[123]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[124]  Antonin Chambolle,et al.  A First-Order Primal-Dual Algorithm for Convex Problems with Applications to Imaging , 2011, Journal of Mathematical Imaging and Vision.

[125]  Emmanuel J. Candès,et al.  Templates for convex cone problems with applications to sparse signal recovery , 2010, Math. Program. Comput..

[126]  Michael Elad,et al.  On Single Image Scale-Up Using Sparse-Representations , 2010, Curves and Surfaces.

[127]  Yann LeCun,et al.  Learning Fast Approximations of Sparse Coding , 2010, ICML.

[128]  Junfeng Yang,et al.  A Fast Alternating Direction Method for TVL1-L2 Signal Reconstruction From Partial Fourier Data , 2010, IEEE Journal of Selected Topics in Signal Processing.

[129]  Xiaobai Sun,et al.  Video rate spectral imaging using a coded aperture snapshot spectral imager. , 2009, Optics express.

[130]  Michael Elad,et al.  Sparse Representation for Color Image Restoration , 2008, IEEE Transactions on Image Processing.

[131]  Stefan Schaal,et al.  Policy Gradient Methods for Robotics , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[132]  Jean-Michel Morel,et al.  A non-local algorithm for image denoising , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[133]  A. O. Rodríguez,et al.  Principles of magnetic resonance imaging , 2004 .

[134]  Jitendra Malik,et al.  A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[135]  Stephen J. Wright,et al.  Numerical Optimization , 2018, Fundamental Statistical Inference.

[136]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[137]  David S. Wile,et al.  Abstract Syntax from Concrete Syntax , 1997, Proceedings of the (19th) International Conference on Software Engineering.

[138]  Erling D. Andersen,et al.  Presolving in linear programming , 1995, Math. Program..

[139]  Donald Geman,et al.  Nonlinear image recovery with half-quadratic regularization , 1995, IEEE Trans. Image Process..

[140]  Long Ji Lin,et al.  Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.

[141]  R. Tyrrell Rockafellar,et al.  Augmented Lagrangians and Applications of the Proximal Point Algorithm in Convex Programming , 1976, Math. Oper. Res..

[142]  Ronald E. Bruck An iterative solution of a variational inequality for certain monotone operators in Hilbert space , 1975 .

[143]  J. Goodman Introduction to Fourier optics , 1969 .

[144]  Fabian Neumann,et al.  Benefits of a Hydrogen Network in Europe , 2022, SSRN Electronic Journal.

[145]  Delio Vicini Path Replay Backpropagation: Differentiating Light Paths using Constant Memory and Linear Time , 2021 .

[146]  M. Korpås,et al.  Demystifying market clearing and price setting effects in low-carbon energy systems , 2021 .

[147]  R. Vanderbei Linear Programming , 2020, International Series in Operations Research & Management Science.

[148]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[149]  Lei Zhang,et al.  Weighted Nuclear Norm Minimization and Its Applications to Low Level Vision , 2016, International Journal of Computer Vision.

[150]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[151]  G. Evans,et al.  Learning to Optimize , 2008 .

[152]  Wotao Yin,et al.  An Iterative Regularization Method for Total Variation-Based Image Restoration , 2005, Multiscale Model. Simul..

[153]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[154]  Mark Segal,et al.  The OpenGL Graphics System: A Specification , 2004 .

[155]  B. Mercier,et al.  A dual algorithm for the solution of nonlinear variational problems via finite element approximation , 1976 .

[156]  J. Moreau Proximité et dualité dans un espace hilbertien , 1965 .