Detecting adversarial manipulation using inductive Venn-ABERS predictors

Abstract Inductive Venn-ABERS predictors (IVAPs) are a type of probabilistic predictors with the theoretical guarantee that their predictions are perfectly calibrated. In this paper, we propose to exploit this calibration property for the detection of adversarial examples in binary classification tasks. By rejecting predictions if the uncertainty of the IVAP is too high, we obtain an algorithm that is both accurate on the original test set and resistant to adversarial examples. This robustness is observed on adversarials for the underlying model as well as adversarials that were generated by taking the IVAP into account. The method appears to offer competitive robustness compared to the state-of-the-art in adversarial defense yet it is computationally much more tractable.

[1]  Fabio Roli,et al.  Wild Patterns: Ten Years After the Rise of Adversarial Machine Learning , 2017, Pattern Recognit..

[2]  W. Youden,et al.  Index for rating diagnostic tests , 1950, Cancer.

[3]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[4]  H. Tuy Convex analysis and global optimization , 1998 .

[5]  Jason Yosinski,et al.  Deep neural networks are easily fooled: High confidence predictions for unrecognizable images , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Somesh Jha,et al.  Objective Metrics and Gradient Descent Algorithms for Adversarial Examples in Machine Learning , 2017, ACSAC.

[7]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[8]  Jon Howell,et al.  Asirra: a CAPTCHA that exploits interest-aligned manual image categorization , 2007, CCS '07.

[9]  Terrance E. Boult,et al.  Assessing Threat of Adversarial Examples on Deep Neural Networks , 2016, 2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA).

[10]  Tom Schaul,et al.  Natural Evolution Strategies , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).

[11]  Xin Li,et al.  Adversarial Examples Detection in Deep Networks with Convolutional Filter Statistics , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[12]  Blaine Nelson,et al.  Can machine learning be secure? , 2006, ASIACCS '06.

[13]  Yvan Saeys,et al.  Detecting adversarial examples with inductive Venn-ABERS predictors , 2019, ESANN.

[14]  Larry Wasserman Frasian Inference , 2012 .

[15]  Vladimir Vovk,et al.  A tutorial on conformal prediction , 2007, J. Mach. Learn. Res..

[16]  D. A. S. Fraser Is Bayes Posterior just Quick and Dirty Confidence , 2011 .

[17]  Murat Kantarcioglu,et al.  Adversarial Machine Learning , 2018, Adversarial Machine Learning.

[18]  Ananthram Swami,et al.  Practical Black-Box Attacks against Machine Learning , 2016, AsiaCCS.

[19]  Vladimir Vovk,et al.  Large-scale probabilistic predictors with and without guarantees of validity , 2015, NIPS.

[20]  Valery Manokhin,et al.  Multi-class probabilistic classification using inductive and cross Venn-Abers predictors , 2017, COPA.

[21]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[22]  Seyed-Mohsen Moosavi-Dezfooli,et al.  DeepFool: A Simple and Accurate Method to Fool Deep Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Julio Hernandez-Castro,et al.  No Bot Expects the DeepCAPTCHA! Introducing Immutable Adversarial Examples, With Applications to CAPTCHA Generation , 2017, IEEE Transactions on Information Forensics and Security.

[24]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[25]  Alexander Gammerman,et al.  Hedging Predictions in Machine Learning: The Second Computer Journal Lecture , 2006, Comput. J..

[26]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[27]  Pedro M. Domingos,et al.  Adversarial classification , 2004, KDD.

[28]  Xiaolin Hu,et al.  Defense Against Adversarial Attacks Using High-Level Representation Guided Denoiser , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[29]  David Wagner,et al.  Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods , 2017, AISec@CCS.

[30]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[31]  G. Shafer,et al.  Algorithmic Learning in a Random World , 2005 .