Certified Defenses against Adversarial Examples

While neural networks have achieved high accuracy on standard image classification benchmarks, their accuracy drops to nearly zero in the presence of small adversarial perturbations to test inputs. Defenses based on regularization and adversarial training have been proposed, but often followed by new, stronger attacks that defeat these defenses. Can we somehow end this arms race? In this work, we study this problem for neural networks with one hidden layer. We first propose a method based on a semidefinite relaxation that outputs a certificate that for a given network and test input, no attack can force the error to exceed a certain value. Second, as this certificate is differentiable, we jointly optimize it with the network parameters, providing an adaptive regularizer that encourages robustness against all attacks. On MNIST, our approach produces a network and a certificate that no attack that perturbs each pixel by at most \epsilon = 0.1 can cause more than 35% test error.

[1]  H. Rice Classes of recursively enumerable sets and their decision problems , 1953 .

[2]  A. M. Lyapunov The general problem of the stability of motion , 1992 .

[3]  David P. Williamson,et al.  Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming , 1995, JACM.

[4]  T. Basar,et al.  H∞-0ptimal Control and Related Minimax Design Problems: A Dynamic Game Approach , 1996, IEEE Trans. Autom. Control..

[5]  Jos F. Sturm,et al.  A Matlab toolbox for optimization over symmetric cones , 1999 .

[6]  John Lygeros,et al.  Controllers for reachability specifications for hybrid systems , 1999, Autom..

[7]  A. Papachristodoulou,et al.  On the construction of Lyapunov functions using the sum of squares decomposition , 2002, Proceedings of the 41st IEEE Conference on Decision and Control, 2002..

[8]  Pablo A. Parrilo,et al.  Semidefinite programming relaxations for semialgebraic problems , 2003, Math. Program..

[9]  J. Lofberg,et al.  YALMIP : a toolbox for modeling and optimization in MATLAB , 2004, 2004 IEEE International Conference on Robotics and Automation (IEEE Cat. No.04CH37508).

[10]  Alexandre M. Bayen,et al.  A time-dependent Hamilton-Jacobi formulation of reachable sets for continuous dynamic games , 2005, IEEE Transactions on Automatic Control.

[11]  A. Papachristodoulou,et al.  Analysis of Non-polynomial Systems using the Sum of Squares Decomposition , 2005 .

[12]  James Newsome,et al.  Paragraph: Thwarting Signature Learning by Training Maliciously , 2006, RAID.

[13]  Ian R. Manchester,et al.  LQR-trees: Feedback Motion Planning via Sums-of-Squares Verification , 2010, Int. J. Robotics Res..

[14]  Blaine Nelson,et al.  The security of machine learning , 2010, Machine Learning.

[15]  Mark M. Tobenkin,et al.  Invariant Funnels around Trajectories using Sum-of-Squares Programming , 2010, 1010.3013.

[16]  Constantine Caramanis,et al.  Theory and Applications of Robust Optimization , 2010, SIAM Rev..

[17]  Hari Balakrishnan,et al.  TCP ex machina: computer-generated congestion control , 2013, SIGCOMM.

[18]  Fabio Roli,et al.  Security Evaluation of Pattern Classifiers under Attack , 2014, IEEE Transactions on Knowledge and Data Engineering.

[19]  Fabio Roli,et al.  Poisoning behavioral malware clustering , 2014, AISec '14.

[20]  Pavel Laskov,et al.  Practical Evasion of a Learning-Based Classifier: A Case Study , 2014, 2014 IEEE Symposium on Security and Privacy.

[21]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[22]  Hari Balakrishnan,et al.  An experimental study of the learnability of congestion control , 2014, SIGCOMM.

[23]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[24]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[25]  Shin Ishii,et al.  Distributional Smoothing with Virtual Adversarial Training , 2015, ICLR 2016.

[26]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[27]  Shinpei Kato,et al.  APEX: Autonomous Vehicle Plan Verification and Execution , 2016 .

[28]  Geoffrey Zweig,et al.  Achieving Human Parity in Conversational Speech Recognition , 2016, ArXiv.

[29]  Ananthram Swami,et al.  Distillation as a Defense to Adversarial Perturbations Against Deep Neural Networks , 2015, 2016 IEEE Symposium on Security and Privacy (SP).

[30]  Joseph Gardiner,et al.  On the Security of Machine Learning in Malware C&C Detection , 2016, ACM Comput. Surv..

[31]  Antonio Criminisi,et al.  Measuring Neural Net Robustness with Constraints , 2016, NIPS.

[32]  Lujo Bauer,et al.  Accessorize to a Crime: Real and Stealthy Attacks on State-of-the-Art Face Recognition , 2016, CCS.

[33]  Ananthram Swami,et al.  The Limitations of Deep Learning in Adversarial Settings , 2015, 2016 IEEE European Symposium on Security and Privacy (EuroS&P).

[34]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[35]  David A. Wagner,et al.  Defensive Distillation is Not Robust to Adversarial Examples , 2016, ArXiv.

[36]  Micah Sherr,et al.  Hidden Voice Commands , 2016, USENIX Security Symposium.

[37]  Michael P. Wellman,et al.  Towards the Science of Security and Privacy in Machine Learning , 2016, ArXiv.

[38]  Patrick D. McDaniel,et al.  Cleverhans V0.1: an Adversarial Machine Learning Library , 2016, ArXiv.

[39]  Moustapha Cissé,et al.  Parseval Networks: Improving Robustness to Adversarial Examples , 2017, ICML.

[40]  Matthias Hein,et al.  Formal Guarantees on the Robustness of a Classifier against Adversarial Manipulation , 2017, NIPS.

[41]  Atul Prakash,et al.  Robust Physical-World Attacks on Machine Learning Models , 2017, ArXiv.

[42]  Mykel J. Kochenderfer,et al.  Towards Proving the Adversarial Robustness of Deep Neural Networks , 2017, FVAV@iFM.

[43]  Surya Ganguli,et al.  Biologically inspired protection of deep networks from adversarial attacks , 2017, ArXiv.

[44]  David Wagner,et al.  Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods , 2017, AISec@CCS.

[45]  Mykel J. Kochenderfer,et al.  Reluplex: An Efficient SMT Solver for Verifying Deep Neural Networks , 2017, CAV.

[46]  Samy Bengio,et al.  Adversarial examples in the physical world , 2016, ICLR.

[47]  David L. Dill,et al.  Ground-Truth Adversarial Examples , 2017, ArXiv.

[48]  Min Wu,et al.  Safety Verification of Deep Neural Networks , 2016, CAV.

[49]  Percy Liang,et al.  Certified Defenses for Data Poisoning Attacks , 2017, NIPS.

[50]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[51]  Houssam Abbas,et al.  Computer-aided design for safe autonomous vehicles , 2017, 2017 Resilience Week (RWS).

[52]  Dan Boneh,et al.  Ensemble Adversarial Training: Attacks and Defenses , 2017, ICLR.

[53]  Jinfeng Yi,et al.  EAD: Elastic-Net Attacks to Deep Neural Networks via Adversarial Examples , 2017, AAAI.

[54]  Michael P. Wellman,et al.  SoK: Security and Privacy in Machine Learning , 2018, 2018 IEEE European Symposium on Security and Privacy (EuroS&P).

[55]  J. Zico Kolter,et al.  Provable defenses against adversarial examples via the convex outer adversarial polytope , 2017, ICML.

[56]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.