OptNet: Differentiable Optimization as a Layer in Neural Networks

This paper presents OptNet, a network architecture that integrates optimization problems (here, specifically in the form of quadratic programs) as individual layers in larger end-to-end trainable deep networks. These layers encode constraints and complex dependencies between the hidden states that traditional convolutional and fully-connected layers often cannot capture. In this paper, we explore the foundations for such an architecture: we show how techniques from sensitivity analysis, bilevel optimization, and implicit differentiation can be used to exactly differentiate through these layers and with respect to layer parameters; we develop a highly efficient solver for these layers that exploits fast GPU-based batch solves within a primal-dual interior point method, and which provides backpropagation gradients with virtually no additional cost on top of the solve; and we highlight the application of these approaches in several problems. In one notable example, we show that the method is capable of learning to play mini-Sudoku (4x4) given just input and output games, with no a priori information about the rules of the game; this highlights the ability of our architecture to learn hard constraints better than other neural architectures.

[1]  F. Clarke Generalized gradients and applications , 1975 .

[2]  Per Lötstedt Numerical Simulation of Time-Dependent Contact and Friction Problems in Rigid Body Mechanics , 1984 .

[3]  Leon O. Chua,et al.  Neural networks for nonlinear programming , 1988 .

[4]  S. Sastry,et al.  Adaptive Control: Stability, Convergence and Robustness , 1989 .

[5]  A. Fiacco,et al.  Sensitivity and stability analysis for nonlinear programming , 1991 .

[6]  Stefen Hui,et al.  On solving constrained optimization problems with neural networks: a penalty method approach , 1993, IEEE Trans. Neural Networks.

[7]  Dimitri P. Bertsekas,et al.  Nonlinear Programming , 1997 .

[8]  Jean-Yves Audibert Optimization for Machine Learning , 1995 .

[9]  Stephen J. Wright Primal-Dual Interior-Point Methods , 1997, Other Titles in Applied Mathematics.

[10]  Jay H. Lee,et al.  Model predictive control: past, present and future , 1999 .

[11]  Andreas Griewank,et al.  Evaluating derivatives - principles and techniques of algorithmic differentiation, Second Edition , 2000, Frontiers in applied mathematics.

[12]  Ben Taskar,et al.  Learning structured prediction models: a large margin approach , 2005, ICML.

[13]  S. Sra,et al.  Matrix Differential Calculus , 2005 .

[14]  Thomas Hofmann,et al.  Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[15]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[16]  Fu Jie Huang,et al.  A Tutorial on Energy-Based Learning , 2006 .

[17]  Edward H. Adelson,et al.  Learning Gaussian Conditional Random Fields for Low-Level Vision , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Yoram Singer,et al.  Efficient projections onto the l1-ball for learning in high dimensions , 2008, ICML '08.

[19]  R. Rockafellar,et al.  Implicit Functions and Solution Mappings , 2009 .

[20]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[21]  Jian Peng,et al.  Conditional Neural Fields , 2009, NIPS.

[22]  Veselin Stoyanov,et al.  Empirical Risk Minimization of Graphical Model Parameters Given Approximate Inference, Decoding, and Model Structure , 2011, AISTATS.

[23]  Stephen P. Boyd,et al.  CVXGEN: a code generator for embedded convex optimization , 2011, Optimization and Engineering.

[24]  Jean Ponce,et al.  Task-Driven Dictionary Learning , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Justin Domke,et al.  Generic Methods for Optimization-Based Modeling , 2012, AISTATS.

[26]  Karl Kunisch,et al.  A Bilevel Optimization Approach for Parameter Learning in Variational Models , 2013, SIAM J. Imaging Sci..

[27]  Yoshua Bengio,et al.  Multi-Prediction Deep Boltzmann Machines , 2013, NIPS.

[28]  Benjamin Schrauwen,et al.  Training energy-based models for time-series imputation , 2013, J. Mach. Learn. Res..

[29]  Stefan Roth,et al.  Shrinkage Fields for Effective Image Restoration , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[31]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[32]  Alan L. Yuille,et al.  Learning Deep Structured Models , 2014, ICML.

[33]  Vibhav Vineet,et al.  Conditional Random Fields as Recurrent Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[34]  Anoop Cherian,et al.  On Differentiating Parameterized Argmin and Argmax Problems with Application to Bi-level Optimization , 2016, ArXiv.

[35]  Ryan P. Adams,et al.  Composing graphical models with neural networks for structured representations and fast inference , 2016, NIPS.

[36]  Andrew McCallum,et al.  Structured Prediction Energy Networks , 2015, ICML.

[37]  Benjamin Pfaff,et al.  Perturbation Analysis Of Optimization Problems , 2016 .

[38]  David Pfau,et al.  Unrolled Generative Adversarial Networks , 2016, ICLR.

[39]  Andrew McCallum,et al.  End-to-End Learning for Structured Prediction Energy Networks , 2017, ICML.

[40]  Lei Xu,et al.  Input Convex Neural Networks : Supplementary Material , 2017 .