论文信息 - OptNet: Differentiable Optimization as a Layer in Neural Networks

OptNet: Differentiable Optimization as a Layer in Neural Networks

This paper presents OptNet, a network architecture that integrates optimization problems (here, specifically in the form of quadratic programs) as individual layers in larger end-to-end trainable deep networks. These layers encode constraints and complex dependencies between the hidden states that traditional convolutional and fully-connected layers often cannot capture. In this paper, we explore the foundations for such an architecture: we show how techniques from sensitivity analysis, bilevel optimization, and implicit differentiation can be used to exactly differentiate through these layers and with respect to layer parameters; we develop a highly efficient solver for these layers that exploits fast GPU-based batch solves within a primal-dual interior point method, and which provides backpropagation gradients with virtually no additional cost on top of the solve; and we highlight the application of these approaches in several problems. In one notable example, we show that the method is capable of learning to play mini-Sudoku (4x4) given just input and output games, with no a priori information about the rules of the game; this highlights the ability of our architecture to learn hard constraints better than other neural architectures.

J. Zico Kolter | Brandon Amos | J. Z. Kolter | Brandon Amos

[1] F. Clarke. Generalized gradients and applications , 1975 .

[2] Per Lötstedt. Numerical Simulation of Time-Dependent Contact and Friction Problems in Rigid Body Mechanics , 1984 .

[3] Leon O. Chua,et al. Neural networks for nonlinear programming , 1988 .

[4] S. Sastry,et al. Adaptive Control: Stability, Convergence and Robustness , 1989 .

[5] A. Fiacco,et al. Sensitivity and stability analysis for nonlinear programming , 1991 .

[6] Stefen Hui,et al. On solving constrained optimization problems with neural networks: a penalty method approach , 1993, IEEE Trans. Neural Networks.

[7] Dimitri P. Bertsekas,et al. Nonlinear Programming , 1997 .

[8] Jean-Yves Audibert. Optimization for Machine Learning , 1995 .

[9] Stephen J. Wright. Primal-Dual Interior-Point Methods , 1997, Other Titles in Applied Mathematics.

[10] Jay H. Lee,et al. Model predictive control: past, present and future , 1999 .

[11] Andreas Griewank,et al. Evaluating derivatives - principles and techniques of algorithmic differentiation, Second Edition , 2000, Frontiers in applied mathematics.

[12] Ben Taskar,et al. Learning structured prediction models: a large margin approach , 2005, ICML.

[13] S. Sra,et al. Matrix Differential Calculus , 2005 .

[14] Thomas Hofmann,et al. Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[15] Stephen P. Boyd,et al. Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[16] Fu Jie Huang,et al. A Tutorial on Energy-Based Learning , 2006 .

[17] Edward H. Adelson,et al. Learning Gaussian Conditional Random Fields for Low-Level Vision , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[18] Yoram Singer,et al. Efficient projections onto the l1-ball for learning in high dimensions , 2008, ICML '08.

[19] R. Rockafellar,et al. Implicit Functions and Solution Mappings , 2009 .

[20] Nir Friedman,et al. Probabilistic Graphical Models - Principles and Techniques , 2009 .

[21] Jian Peng,et al. Conditional Neural Fields , 2009, NIPS.

[22] Veselin Stoyanov,et al. Empirical Risk Minimization of Graphical Model Parameters Given Approximate Inference, Decoding, and Model Structure , 2011, AISTATS.

[23] Stephen P. Boyd,et al. CVXGEN: a code generator for embedded convex optimization , 2011, Optimization and Engineering.

[24] Jean Ponce,et al. Task-Driven Dictionary Learning , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25] Justin Domke,et al. Generic Methods for Optimization-Based Modeling , 2012, AISTATS.

[26] Karl Kunisch,et al. A Bilevel Optimization Approach for Parameter Learning in Variational Models , 2013, SIAM J. Imaging Sci..

[27] Yoshua Bengio,et al. Multi-Prediction Deep Boltzmann Machines , 2013, NIPS.

[28] Benjamin Schrauwen,et al. Training energy-based models for time-series imputation , 2013, J. Mach. Learn. Res..

[29] Stefan Roth,et al. Shrinkage Fields for Effective Image Restoration , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[30] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[31] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[32] Alan L. Yuille,et al. Learning Deep Structured Models , 2014, ICML.

[33] Vibhav Vineet,et al. Conditional Random Fields as Recurrent Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[34] Anoop Cherian,et al. On Differentiating Parameterized Argmin and Argmax Problems with Application to Bi-level Optimization , 2016, ArXiv.

[35] Ryan P. Adams,et al. Composing graphical models with neural networks for structured representations and fast inference , 2016, NIPS.

[36] Andrew McCallum,et al. Structured Prediction Energy Networks , 2015, ICML.

[37] Benjamin Pfaff,et al. Perturbation Analysis Of Optimization Problems , 2016 .

[38] David Pfau,et al. Unrolled Generative Adversarial Networks , 2016, ICLR.

[39] Andrew McCallum,et al. End-to-End Learning for Structured Prediction Energy Networks , 2017, ICML.

[40] Lei Xu,et al. Input Convex Neural Networks : Supplementary Material , 2017 .