论文信息 - Alternating Differentiation for Optimization Layers - 字舞流文

Alternating Differentiation for Optimization Layers

The idea of embedding optimization problems into deep neural networks as optimization layers to encode constraints and inductive priors has taken hold in recent years. Most existing methods focus on implicitly diﬀerentiating Karush–Kuhn–Tucker (KKT) conditions in a way that requires expensive computations on the Jacobian matrix, which can be slow and memory-intensive. In this paper, we developed a new framework, named Alternating Diﬀerentiation (Alt-Diﬀ), that diﬀerentiates optimization problems (here, speciﬁcally in the form of convex optimization problems with polyhedral constraints) in a fast and recursive way. Alt-Diﬀ decouples the diﬀerentiation procedure into a primal update and a dual update in an alternating way. Accordingly, Alt-Diﬀ substantially decreases the dimensions of the Jacobian matrix and thus signiﬁcantly increases the computational speed of implicit diﬀerentiation. Further, we present the computational complexity of the forward and backward pass of Alt-Diﬀ and show that Alt-Diﬀ enjoys quadratic computational complexity in the backward pass. Another notable diﬀerence between Alt-Diﬀ and state-of-the-arts is that Alt-Diﬀ can be truncated for the optimization layer. We theoretically show that: 1) Alt-Diﬀ can converge to consistent gradients obtained by diﬀerentiating KKT conditions; 2) the error between the gradient obtained by the truncated Alt-Diﬀ and by diﬀerentiating KKT conditions is upper bounded by the same order of variables’ truncation error. Therefore, Alt-Diﬀ can be truncated to further increases computational speed without sacriﬁcing much accuracy. A series of comprehensive experiments demonstrate that Alt-Diﬀ yields results comparable to the state-of-the-arts in far less time.

H. Tuan | H. Poor | Dacheng Tao | Ye Shi | Haixiang Sun | Jingya Wang

[1] Samy Wu Fung,et al. JFB: Jacobian-Free Backpropagation for Implicit Networks , 2021, AAAI.

[2] Stephen Gould,et al. Deep Declarative Networks , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3] Adam N. Elmachtoub,et al. Smart "Predict, then Optimize" , 2017, Manag. Sci..

[4] J. Zico Kolter,et al. Joint inference and input optimization in equilibrium networks , 2021, NeurIPS.

[5] Zhouchen Lin,et al. On Training Implicit Models , 2021, NeurIPS.

[6] Hongxu Chen,et al. Is Attention Better Than Matrix Decomposition? , 2021, ICLR.

[7] J. Bolte,et al. Nonsmooth Implicit Differentiation for Machine Learning and Optimization , 2021, NeurIPS.

[8] Marco Cuturi,et al. Efficient and Modular Implicit Differentiation , 2021, NeurIPS.

[9] Zheng Zhang,et al. Graph Neural Networks Inspired by Classical Iterative Algorithms , 2021, ICML.

[10] Yonina C. Eldar,et al. Algorithm Unrolling: Interpretable, Efficient Deep Learning for Signal and Image Processing , 2019, IEEE Signal Processing Magazine.

[11] Tias Guns,et al. Interior Point Solving for LP-based prediction+optimisation , 2020, NeurIPS.

[12] Vladlen Koltun,et al. Multiscale Deep Equilibrium Models , 2020, NeurIPS.

[13] Anders P. Eriksson,et al. Implicitly Defined Layers in Neural Networks , 2020, ArXiv.

[14] Tias Guns,et al. Smart Predict-and-Optimize for Hard Combinatorial Optimization Problems , 2019, AAAI.

[15] Stephen P. Boyd,et al. Differentiable Convex Optimization Layers , 2019, NeurIPS.

[16] Anders P. Eriksson,et al. Implicit Surface Representations As Layers in Neural Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[17] J. Z. Kolter,et al. Deep Equilibrium Models , 2019, NeurIPS.

[18] Priya L. Donti,et al. SATNet: Bridging deep learning and logical reasoning using a differentiable satisfiability solver , 2019, ICML.

[19] Stephen P. Boyd,et al. Differentiating through a cone program , 2019, Journal of Applied and Numerical Optimization.

[20] Yee Whye Teh,et al. Augmented Neural ODEs , 2019, NeurIPS.

[21] Richard A. Newcombe,et al. DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22] Jeremy Nixon,et al. Understanding and correcting pathologies in the training of learned optimizers , 2018, ICML.

[23] David Duvenaud,et al. Neural Ordinary Differential Equations , 2018, NeurIPS.

[24] André F. T. Martins,et al. Sparse and Constrained Attention for Neural Machine Translation , 2018, ACL.

[25] Shane T. Barratt. On the Differentiability of the Solution to Convex Optimization Problems , 2018, 1804.05098.

[26] Saeed Ghadimi,et al. Approximation Methods for Bilevel Programming , 2018, 1802.02246.

[27] Stephen P. Boyd,et al. OSQP: an operator splitting solver for quadratic programs , 2017, 2018 UKACC 12th International Conference on Control (CONTROL).

[28] Gordon Wetzstein,et al. Unrolled Optimization with Deep Priors , 2017, ArXiv.

[29] Andrew McCallum,et al. End-to-End Learning for Structured Prediction Energy Networks , 2017, ICML.

[30] J. Zico Kolter,et al. OptNet: Differentiable Optimization as a Layer in Neural Networks , 2017, ICML.

[31] Priya L. Donti,et al. Task-based End-to-end Model Learning in Stochastic Optimization , 2017, NIPS.

[32] Zhu Han,et al. Resource Management in Cloud Networking Using Economic Analysis and Pricing Models: A Survey , 2017, IEEE Communications Surveys & Tutorials.

[33] Lei Xu,et al. Input Convex Neural Networks : Supplementary Material , 2017 .

[34] Nicola Bui,et al. A Survey of Anticipatory Mobile Networking: Context-Based Classification, Prediction Methodologies, and Optimization Techniques , 2016, IEEE Communications Surveys & Tutorials.

[35] Anoop Cherian,et al. On Differentiating Parameterized Argmin and Argmax Problems with Application to Bi-level Optimization , 2016, ArXiv.

[36] Ramón Fernández Astudillo,et al. From Softmax to Sparsemax: A Sparse Model of Attention and Multi-Label Classification , 2016, ICML.

[37] Andrew McCallum,et al. Structured Prediction Energy Networks , 2015, ICML.

[38] Benjamin Pfaff,et al. Perturbation Analysis Of Optimization Problems , 2016 .

[39] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[40] Karl Kunisch,et al. A Bilevel Optimization Approach for Parameter Learning in Variational Models , 2013, SIAM J. Imaging Sci..

[41] Justin Domke,et al. Generic Methods for Optimization-Based Modeling , 2012, AISTATS.

[42] Stephen P. Boyd,et al. Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[43] Lei Guo,et al. Stochastic Distribution Control System Design: A Convex Optimization Approach , 2010 .

[44] Stephen P. Boyd,et al. Real-Time Convex Optimization in Signal Processing , 2010, IEEE Signal Processing Magazine.

[45] Chuan-Sheng Foo,et al. Efficient multiple hyperparameter learning for log-linear models , 2007, NIPS.

[46] R. Glowinski,et al. Augmented Lagrangian and Operator-Splitting Methods in Nonlinear Mechanics , 1987 .

[47] Michael A. Saunders,et al. LSQR: An Algorithm for Sparse Linear Equations and Sparse Least Squares , 1982, TOMS.

[48] William A. Kirk,et al. A Fixed Point Theorem for Mappings which do not Increase Distances , 1965 .