PROVEN: Certifying Robustness of Neural Networks with a Probabilistic Approach

With deep neural networks providing state-of-the-art machine learning models for numerous machine learning tasks, quantifying the robustness of these models has become an important area of research. However, most of the research literature merely focuses on the \textit{worst-case} setting where the input of the neural network is perturbed with noises that are constrained within an $\ell_p$ ball; and several algorithms have been proposed to compute certified lower bounds of minimum adversarial distortion based on such worst-case analysis. In this paper, we address these limitations and extend the approach to a \textit{probabilistic} setting where the additive noises can follow a given distributional characterization. We propose a novel probabilistic framework PROVEN to PRObabilistically VErify Neural networks with statistical guarantees -- i.e., PROVEN certifies the probability that the classifier's top-1 prediction cannot be altered under any constrained $\ell_p$ norm perturbation to a given input. Importantly, we show that it is possible to derive closed-form probabilistic certificates based on current state-of-the-art neural network robustness verification frameworks. Hence, the probabilistic certificates provided by PROVEN come naturally and with almost no overhead when obtaining the worst-case certified lower bounds from existing methods such as Fast-Lin, CROWN and CNN-Cert. Experiments on small and large MNIST and CIFAR neural network models demonstrate our probabilistic approach can achieve up to around $75\%$ improvement in the robustness certification with at least a $99.99\%$ confidence compared with the worst-case robustness certificate delivered by CROWN.

[1]  Fabio Roli,et al.  Wild Patterns: Ten Years After the Rise of Adversarial Machine Learning , 2018, CCS.

[2]  Yee Whye Teh,et al.  A Statistical Approach to Assessing Neural Network Robustness , 2018, ICLR.

[3]  S. Resnick A Probability Path , 1999 .

[4]  Wenbo Guo,et al.  Adversary Resistant Deep Neural Networks with an Application to Malware Detection , 2016, KDD.

[5]  Seyed-Mohsen Moosavi-Dezfooli,et al.  Robustness of classifiers: from adversarial to random noise , 2016, NIPS.

[6]  Matthias Hein,et al.  Formal Guarantees on the Robustness of a Classifier against Adversarial Manipulation , 2017, NIPS.

[7]  Radha Poovendran,et al.  Google's Cloud Vision API is Not Robust to Noise , 2017, 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA).

[8]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[9]  Aravind Srinivasan,et al.  Randomized Distributed Edge Coloring via an Extension of the Chernoff-Hoeffding Bounds , 1997, SIAM J. Comput..

[10]  John C. Duchi,et al.  Certifiable Distributional Robustness with Principled Adversarial Training , 2017, ArXiv.

[11]  Swarat Chaudhuri,et al.  AI2: Safety and Robustness Certification of Neural Networks with Abstract Interpretation , 2018, 2018 IEEE Symposium on Security and Privacy (SP).

[12]  Suman Jana,et al.  Certified Robustness to Adversarial Examples with Differential Privacy , 2018, 2019 IEEE Symposium on Security and Privacy (SP).

[13]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[14]  Bhanukiran Vinzamuri,et al.  IS ORDERED WEIGHTED ℓ1 REGULARIZED REGRESSION ROBUST TO ADVERSARIAL PERTURBATION? A CASE STUDY ON OSCAR , 2018, 2018 IEEE Global Conference on Signal and Information Processing (GlobalSIP).

[15]  J. Zico Kolter,et al.  Provable defenses against adversarial examples via the convex outer adversarial polytope , 2017, ICML.

[16]  Bernard Ghanem,et al.  Analytic Expressions for Probabilistic Moments of PL-DNN with Gaussian Input , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[17]  Pushmeet Kohli,et al.  A Dual Approach to Scalable Verification of Deep Networks , 2018, UAI.

[18]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[19]  Omar Fawzi,et al.  Robustness of classifiers to uniform $\ell_p$ and Gaussian noise , 2018, AISTATS.

[20]  Jinfeng Yi,et al.  Evaluating the Robustness of Neural Networks: An Extreme Value Theory Approach , 2018, ICLR.

[21]  Hamza Fawzi,et al.  Adversarial vulnerability for any classifier , 2018, NeurIPS.

[22]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[23]  Facebook,et al.  Houdini : Fooling Deep Structured Visual and Speech Recognition Models with Adversarial Examples , 2017 .

[24]  Inderjit S. Dhillon,et al.  Towards Fast Computation of Certified Robustness for ReLU Networks , 2018, ICML.

[25]  H. Teicher,et al.  Probability theory: Independence, interchangeability, martingales , 1978 .

[26]  Cho-Jui Hsieh,et al.  Efficient Neural Network Robustness Certification with General Activation Functions , 2018, NeurIPS.

[27]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[28]  Aditi Raghunathan,et al.  Certified Defenses against Adversarial Examples , 2018, ICLR.

[29]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[30]  Sijia Liu,et al.  CNN-Cert: An Efficient Framework for Certifying Robustness of Convolutional Neural Networks , 2018, AAAI.

[31]  Mykel J. Kochenderfer,et al.  Reluplex: An Efficient SMT Solver for Verifying Deep Neural Networks , 2017, CAV.

[32]  Yvan Saeys,et al.  Lower bounds on the robustness to adversarial perturbations , 2017, NIPS.

[33]  Omar Fawzi,et al.  Robustness of classifiers to uniform $\ell_p$ and Gaussian noise , 2018, AISTATS.

[34]  Rüdiger Ehlers,et al.  Formal Verification of Piece-Wise Linear Feed-Forward Neural Networks , 2017, ATVA.

[35]  Desh Ranjan,et al.  Balls and bins: A study in negative dependence , 1996, Random Struct. Algorithms.

[36]  Pushmeet Kohli,et al.  Verification of deep probabilistic models , 2018, ArXiv.