Lipschitz constant estimation of Neural Networks via sparse polynomial optimization

We introduce LiPopt, a polynomial optimization framework for computing increasingly tighter upper bound on the Lipschitz constant of neural networks. The underlying optimization problems boil down to either linear (LP) or semidefinite (SDP) programming. We show how to use structural properties of the network, such as sparsity, to significantly reduce the complexity of computation. This is specially useful for convolutional as well as pruned neural networks. We conduct experiments on networks with random weights as well as networks trained on MNIST, showing that in the particular case of the $\ell_\infty$-Lipschitz constant, our approach yields superior estimates as compared to other baselines available in the literature.

[1]  J. Zico Kolter,et al.  Wasserstein Adversarial Examples via Projected Sinkhorn Iterations , 2019, ICML.

[2]  Michael Carbin,et al.  The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks , 2018, ICLR.

[3]  Amir Ali Ahmadi,et al.  DSOS and SDSOS Optimization: More Tractable Alternatives to Sum of Squares and Semidefinite Optimization , 2017, SIAM J. Appl. Algebra Geom..

[4]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[5]  Sepp Hochreiter,et al.  Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.

[6]  Cem Anil,et al.  Sorting out Lipschitz function approximation , 2018, ICML.

[7]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Volkan Cevher,et al.  A Conditional-Gradient-Based Augmented Lagrangian Framework , 2019, ICML.

[9]  Yuichi Yoshida,et al.  Spectral Normalization for Generative Adversarial Networks , 2018, ICLR.

[10]  Stephen P. Boyd,et al.  Further Relaxations of the SDP Approach to Sensor Network Localization , 2007 .

[11]  Masakazu Kojima,et al.  Sparsity in sums of squares of polynomials , 2005, Math. Program..

[12]  Ritu Chadha,et al.  Limitations of the Lipschitz constant as a defense against adversarial examples , 2018, Nemesis/UrbReas/SoGood/IWAISe/GDM@PKDD/ECML.

[13]  Zhengfeng Yang,et al.  Safety Verification of Nonlinear Hybrid Systems Based on Bilinear Programming , 2018, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[14]  J. Krivine,et al.  Anneaux préordonnés , 1964 .

[15]  Jean B. Lasserre,et al.  Convergent SDP-Relaxations for Polynomial Optimization with Sparsity , 2006, ICMS.

[16]  Andreas Krause,et al.  Safe Model-based Reinforcement Learning with Stability Guarantees , 2017, NIPS.

[17]  Yinyu Ye,et al.  Approximating quadratic programming with bound and quadratic constraints , 1999, Math. Program..

[18]  Masakazu Muramatsu,et al.  Sums of Squares and Semidefinite Programming Relaxations for Polynomial Optimization Problems with Structured Sparsity , 2004 .

[19]  Inderjit S. Dhillon,et al.  Towards Fast Computation of Certified Robustness for ReLU Networks , 2018, ICML.

[20]  Jean B. Lasserre,et al.  Sparse-BSOS: a bounded degree SOS hierarchy for large scale polynomial optimization with sparsity , 2016, Mathematical Programming Computation.

[21]  Léon Bottou,et al.  Wasserstein Generative Adversarial Networks , 2017, ICML.

[22]  Jakub Marecek,et al.  Optimal Power Flow as a Polynomial Optimization Problem , 2014, IEEE Transactions on Power Systems.

[23]  Matus Telgarsky,et al.  Spectrally-normalized margin bounds for neural networks , 2017, NIPS.

[24]  Lorien Y. Pratt,et al.  Comparing Biases for Minimal Network Construction with Back-Propagation , 1988, NIPS.

[25]  Patrick L. Combettes,et al.  Lipschitz Certificates for Layered Network Structures Driven by Averaged Activation Operators , 2019, SIAM J. Math. Data Sci..

[26]  Xin Chen,et al.  A Linear Programming Relaxation Based Approach for Generating Barrier Certificates of Hybrid Systems , 2016, FM.

[27]  D. Handelman Representing polynomials by positive linear functions on compact convex polyhedra. , 1988 .

[28]  Moustapha Cissé,et al.  Parseval Networks: Improving Robustness to Adversarial Examples , 2017, ICML.

[29]  Aaron C. Courville,et al.  Improved Training of Wasserstein GANs , 2017, NIPS.

[30]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[31]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[32]  Kevin Scaman,et al.  Lipschitz regularity of deep neural networks: analysis and efficient estimation , 2018, NeurIPS.

[33]  Kaiming He,et al.  Exploring Randomly Wired Neural Networks for Image Recognition , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[34]  G. Stengle A nullstellensatz and a positivstellensatz in semialgebraic geometry , 1974 .

[35]  Aditi Raghunathan,et al.  Semidefinite relaxations for certifying robustness to adversarial examples , 2018, NeurIPS.

[36]  Javad Lavaei,et al.  Stability-Certified Reinforcement Learning: A Control-Theoretic Perspective , 2018, IEEE Access.

[37]  J. Lasserre An Introduction to Polynomial and Semi-Algebraic Optimization , 2015 .

[38]  Manfred Morari,et al.  Efficient and Accurate Estimation of Lipschitz Constants for Deep Neural Networks , 2019, NeurIPS.

[39]  J. Zico Kolter,et al.  Provable defenses against adversarial examples via the convex outer adversarial polytope , 2017, ICML.

[40]  Stephen P. Boyd,et al.  General Heuristics for Nonconvex Quadratically Constrained Quadratic Programming , 2017, 1703.07870.

[41]  Volkan Cevher,et al.  Almost surely constrained convex optimization , 2019, ICML.

[42]  J. Lasserre Convergent LMI relaxations for nonconvex quadratic programs , 2000, Proceedings of the 39th IEEE Conference on Decision and Control (Cat. No.00CH37187).

[43]  Aditi Raghunathan,et al.  Certified Defenses against Adversarial Examples , 2018, ICLR.