Strong mixed-integer programming formulations for trained neural networks

We present strong mixed-integer programming (MIP) formulations for high-dimensional piecewise linear functions that correspond to trained neural networks. These formulations can be used for a number of important tasks, such as verifying that an image classification network is robust to adversarial inputs, or solving decision problems where the objective function is a machine learning model. We present a generic framework, which may be of independent interest, that provides a way to construct sharp or ideal formulations for the maximum of d affine functions over arbitrary polyhedral input domains. We apply this result to derive MIP formulations for a number of the most popular nonlinear operations (e.g. ReLU and max pooling) that are strictly stronger than other approaches from the literature. We corroborate this computationally, showing that our formulations are able to offer substantial improvements in solve time on verification tasks for image classification networks.

[1]  Leon A. Gatys,et al.  A Neural Algorithm of Artistic Style , 2015, ArXiv.

[2]  Christian Tjandraatmadja,et al.  Strong mixed-integer programming formulations for trained neural networks , 2020, Math. Program..

[3]  David Simchi-Levi,et al.  Distributionally Robust Linear and Discrete Optimization with Marginals , 2018, Oper. Res..

[4]  Matteo Fischetti,et al.  On handling indicator constraints in mixed integer programming , 2016, Comput. Optim. Appl..

[5]  Jean Ponce,et al.  Learning mid-level features for recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[6]  Dimitris Bertsimas,et al.  From Predictive to Prescriptive Analytics , 2014, Manag. Sci..

[7]  Nikolaos V. Sahinidis,et al.  Convexification and Global Optimization in Continuous and Mixed-Integer Nonlinear Programming , 2002 .

[8]  Chih-Hong Cheng,et al.  Maximum Resilience of Artificial Neural Networks , 2017, ATVA.

[9]  J. Zico Kolter,et al.  Scaling provable adversarial defenses , 2018, NeurIPS.

[10]  Alessio Lomuscio,et al.  An approach to reachability analysis for feed-forward ReLU neural networks , 2017, ArXiv.

[11]  Cho-Jui Hsieh,et al.  A Convex Relaxation Barrier to Tight Robustness Verification of Neural Networks , 2019, NeurIPS.

[12]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[13]  Mykel J. Kochenderfer,et al.  Algorithms for Verifying Deep Neural Networks , 2019, Found. Trends Optim..

[14]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[15]  George L. Nemhauser,et al.  Modeling disjunctive constraints with a logarithmic number of binary variables and constraints , 2011, Math. Program..

[16]  Juan Pablo Vielma,et al.  Embedding Formulations and Complexity for Unions of Polyhedra , 2015, Manag. Sci..

[17]  Ashish Tiwari,et al.  Output Range Analysis for Deep Feedforward Neural Networks , 2018, NFM.

[18]  Raman Arora,et al.  Understanding Deep Neural Networks with Rectified Linear Units , 2016, Electron. Colloquium Comput. Complex..

[19]  Juan Pablo Vielma Small and strong formulations for unions of convex sets from the Cayley embedding , 2017, Mathematical Programming.

[20]  Boris Hanin,et al.  Universal Function Approximation by Deep Neural Nets with Bounded Width and ReLU Activations , 2017, Mathematics.

[21]  Michele Lombardi,et al.  Boosting Combinatorial Problem Modeling with Machine Learning , 2018, IJCAI.

[22]  Ignacio E. Grossmann,et al.  Improved Big-M reformulation for generalized disjunctive programs , 2015, Comput. Chem. Eng..

[23]  Birkett Huber,et al.  The Cayley Trick, lifting subdivisions and the Bohne-Dress theorem on zonotopal tilings , 2000 .

[24]  Juan Pablo Vielma,et al.  Mixed Integer Linear Programming Formulation Techniques , 2015, SIAM Rev..

[25]  Egon Balas,et al.  programming: Properties of the convex hull of feasible points * , 1998 .

[26]  Erich Elsen,et al.  Deep Speech: Scaling up end-to-end speech recognition , 2014, ArXiv.

[27]  Bjarne Grimstad,et al.  ReLU Networks as Surrogate Models in Mixed-Integer Linear Programs , 2019, Comput. Chem. Eng..

[28]  Scott Sanner,et al.  Nonlinear Hybrid Planning with Deep Net Learned Transition Models and Mixed-Integer Linear Programming , 2017, IJCAI.

[29]  B. Frey,et al.  Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning , 2015, Nature Biotechnology.

[30]  Hans Raj Tiwary,et al.  Exponential Lower Bounds for Polytopes in Combinatorial Optimization , 2011, J. ACM.

[31]  Pierre Bonami,et al.  On mathematical programming with indicator constraints , 2015, Math. Program..

[32]  Deborah Silver,et al.  Feature Visualization , 1994, Scientific Visualization.

[33]  Srikumar Ramalingam,et al.  Equivalent and Approximate Transformations of Deep Neural Networks , 2019, ArXiv.

[34]  J. K. Lowe Modelling with Integer Variables. , 1984 .

[35]  J. Zico Kolter,et al.  Provable defenses against adversarial examples via the convex outer adversarial polytope , 2017, ICML.

[36]  Luca Benini,et al.  Neuron Constraints to Model Complex Real-World Problems , 2011, CP.

[37]  Matteo Fischetti,et al.  Deep neural networks and mixed integer linear optimization , 2018, Constraints.

[38]  Artur M. Schweidtmann,et al.  Deterministic Global Optimization with Artificial Neural Networks Embedded , 2018, Journal of Optimization Theory and Applications.

[39]  John N. Tsitsiklis,et al.  Introduction to linear optimization , 1997, Athena scientific optimization and computation series.

[40]  Marc Peter Deisenroth,et al.  Deep Reinforcement Learning: A Brief Survey , 2017, IEEE Signal Processing Magazine.

[41]  Ananthram Swami,et al.  The Limitations of Deep Learning in Adversarial Settings , 2015, 2016 IEEE European Symposium on Security and Privacy (EuroS&P).

[42]  Aleksander Madry,et al.  Training for Faster Adversarial Robustness Verification via Inducing ReLU Stability , 2018, ICLR.

[43]  José del R. Millán,et al.  Continuous-Action Q-Learning , 2002, Machine Learning.

[44]  Alexander Mordvintsev,et al.  Inceptionism: Going Deeper into Neural Networks , 2015 .

[45]  Aleksander Madry,et al.  Exploring the Landscape of Spatial Robustness , 2017, ICML.

[46]  Michele Lombardi,et al.  A lagrangian propagator for artificial neural networks in constraint programming , 2016, Constraints.

[47]  Adam N. Elmachtoub,et al.  Smart "Predict, then Optimize" , 2017, Manag. Sci..

[48]  Craig Boutilier,et al.  Logistic Markov Decision Processes , 2017, IJCAI.

[49]  Joseph Andrew Huchette,et al.  Advanced mixed-integer programming formulations : methodology, computation, and application , 2018 .

[50]  Andrew L. Maas Rectifier Nonlinearities Improve Neural Network Acoustic Models , 2013 .

[51]  Russ Tedrake,et al.  Verifying Neural Networks with Mixed Integer Programming , 2017, ArXiv.

[52]  Chung-Piaw Teo,et al.  Persistency Model and Its Applications in Choice Modeling , 2009, Manag. Sci..

[53]  Andrea Bartolini,et al.  Empirical decision model learning , 2017, Artif. Intell..

[54]  Antonio Criminisi,et al.  Measuring Neural Net Robustness with Constraints , 2016, NIPS.

[55]  R. G. Jeroslow,et al.  Alternative formulations of mixed integer programs , 1988 .

[56]  Mykel J. Kochenderfer,et al.  Reluplex: An Efficient SMT Solver for Verifying Deep Neural Networks , 2017, CAV.

[57]  Velibor V. Misic,et al.  Optimization of Tree Ensembles , 2017, Oper. Res..

[58]  Gideon Weiss,et al.  Stochastic bounds on distributions of optimal value functions with applications to pert, network flows and reliability , 1984, Oper. Res..

[59]  Christian Tjandraatmadja,et al.  Bounding and Counting Linear Regions of Deep Neural Networks , 2017, ICML.

[60]  Thomas Rothvoß,et al.  The matching polytope has exponential extension complexity , 2013, STOC.

[61]  Georgia Perakis,et al.  Optimizing Objective Functions Determined from Random Forests , 2017 .

[62]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[63]  Craig Boutilier,et al.  Approximate Linear Programming for Logistic Markov Decision Processes , 2017 .

[64]  Pushmeet Kohli,et al.  A Unified View of Piecewise Linear Neural Network Verification , 2017, NeurIPS.

[65]  Maurice Queyranne,et al.  On the convex hull of feasible solutions to certain combinatorial problems , 1992, Oper. Res. Lett..

[66]  Dick den Hertog,et al.  Bridging the gap between predictive and prescriptive analytics-new optimization methodology needed , 2016 .

[67]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[68]  Artur M. Schweidtmann,et al.  Global Deterministic Optimization with Artificial Neural Networks Embedded , 2018 .

[69]  Pushmeet Kohli,et al.  A Dual Approach to Scalable Verification of Deep Networks , 2018, UAI.

[70]  Aditi Raghunathan,et al.  Semidefinite relaxations for certifying robustness to adversarial examples , 2018, NeurIPS.

[71]  Scott Sanner,et al.  Scalable Planning with Tensorflow for Hybrid Nonlinear Domains , 2017, NIPS.

[72]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[73]  W. K. Haneveld Robustness against dependence in PERT: An application of duality and distributions with known marginals , 1986 .

[74]  Mihalis Yannakakis,et al.  Expressing combinatorial optimization problems by linear programs , 1991, STOC '88.

[75]  Priya L. Donti,et al.  Task-based End-to-end Model Learning in Stochastic Optimization , 2017, NIPS.

[76]  Yoshua Bengio,et al.  Maxout Networks , 2013, ICML.

[77]  Christian Tjandraatmadja,et al.  Strong convex relaxations and mixed-integer programming formulations for trained neural networks , 2018 .

[78]  D. Epstein Coalescing Data and Decision Sciences for Analytics , 2018 .

[79]  Luca Benini,et al.  Optimization and Controlled Systems: A Case Study on Thermal Aware Workload Dispatching , 2012, AAAI.

[80]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[81]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[82]  Lei Xu,et al.  Input Convex Neural Networks : Supplementary Material , 2017 .

[83]  Christophe Weibel MINKOWSKI SUMS OF POLYTOPES : COMBINATORICS AND COMPUTATION , 2007 .

[84]  Jens Vygen,et al.  The Book Review Column1 , 2020, SIGACT News.

[85]  Sebastian Pokutta,et al.  Principled Deep Neural Network Training through Linear Programming , 2018, Discret. Optim..

[86]  Srikumar Ramalingam,et al.  Empirical Bounds on Linear Regions of Deep Rectifier Networks , 2018, ISAIM.

[87]  Hassan L. Hijazi,et al.  Mixed-integer nonlinear programs featuring “on/off” constraints , 2012, Comput. Optim. Appl..

[88]  Ashish Tiwari,et al.  Output Range Analysis for Deep Neural Networks , 2017, ArXiv.

[89]  Alper Atamtürk,et al.  Strong formulations for quadratic optimization with M-matrices and indicator variables , 2018, Math. Program..

[90]  Rüdiger Ehlers,et al.  Formal Verification of Piece-Wise Linear Feed-Forward Neural Networks , 2017, ATVA.

[91]  Tianqi Chen,et al.  Empirical Evaluation of Rectified Activations in Convolutional Network , 2015, ArXiv.

[92]  Alejandro Toriello,et al.  Fitting piecewise linear continuous functions , 2012, Eur. J. Oper. Res..

[93]  Bistra N. Dilkina,et al.  Combinatorial Attacks on Binarized Neural Networks , 2019, ICLR.

[94]  E. Balas Disjunctive programming and a hierarchy of relaxations for discrete optimization problems , 1985 .

[95]  Richard Evans,et al.  Deep Reinforcement Learning in Large Discrete Action Spaces , 2015, 1512.07679.

[96]  J. Hooker,et al.  Logic-Based Methods for Optimization: Combining Optimization and Constraint Satisfaction , 2000 .

[97]  Pushmeet Kohli,et al.  Training verified learners with learned verifiers , 2018, ArXiv.

[98]  David K. Gifford,et al.  Convolutional neural network architectures for predicting DNA–protein binding , 2016, Bioinform..