We propose the scheme that mitigates the adversarial perturbation $\epsilon$ on the adversarial example $X_{adv}$ ($=$ $X$ $\pm$ $\epsilon$, $X$ is a benign sample) by subtracting the estimated perturbation $\hat{\epsilon}$ from $X$ $+$ $\epsilon$ and adding $\hat{\epsilon}$ to $X$ $-$ $\epsilon$. The estimated perturbation $\hat{\epsilon}$ comes from the difference between $X_{adv}$ and its moving-averaged outcome $W_{avg}*X_{adv}$ where $W_{avg}$ is $N \times N$ moving average kernel that all the coefficients are one. Usually, the adjacent samples of an image are close to each other such that we can let $X$ $\approx$ $W_{avg}*X$ (naming this relation after X-MAS[X minus Moving Averaged Samples]). By doing that, we can make the estimated perturbation $\hat{\epsilon}$ falls within the range of $\epsilon$. The scheme is also extended to do the multi-level mitigation by configuring the mitigated adversarial example $X_{adv}$ $\pm$ $\hat{\epsilon}$ as a new adversarial example to be mitigated. The multi-level mitigation gets $X_{adv}$ closer to $X$ with a smaller (i.e. mitigated) perturbation than original unmitigated perturbation by setting the moving averaged adversarial sample $W_{avg} * X_{adv}$ (which has the smaller perturbation than $X_{adv}$ if $X$ $\approx$ $W_{avg}*X$) as the boundary condition that the multi-level mitigation cannot cross over (i.e. decreasing $\epsilon$ cannot go below and increasing $\epsilon$ cannot go beyond). With the multi-level mitigation, we can get high prediction accuracies even in the adversarial example having a large perturbation (i.e. $\epsilon$ $>$ $16$). The proposed scheme is evaluated with adversarial examples crafted by the FGSM (Fast Gradient Sign Method) based attacks on ResNet-50 trained with ImageNet dataset.
[1]
Samy Bengio,et al.
Adversarial examples in the physical world
,
2016,
ICLR.
[2]
Colin Raffel,et al.
Thermometer Encoding: One Hot Way To Resist Adversarial Examples
,
2018,
ICLR.
[3]
David A. Wagner,et al.
Towards Evaluating the Robustness of Neural Networks
,
2016,
2017 IEEE Symposium on Security and Privacy (SP).
[4]
Zoubin Ghahramani,et al.
A study of the effect of JPG compression on adversarial images
,
2016,
ArXiv.
[5]
Fei-Fei Li,et al.
ImageNet: A large-scale hierarchical image database
,
2009,
2009 IEEE Conference on Computer Vision and Pattern Recognition.
[6]
Marcus A. Brubaker,et al.
On the Effectiveness of Low Frequency Perturbations
,
2019,
IJCAI.
[7]
Moustapha Cissé,et al.
Countering Adversarial Images using Input Transformations
,
2018,
ICLR.
[8]
Jonathon Shlens,et al.
Explaining and Harnessing Adversarial Examples
,
2014,
ICLR.
[9]
Kilian Q. Weinberger,et al.
Low Frequency Adversarial Perturbation
,
2018,
UAI.
[10]
James A. Storer,et al.
Deflecting Adversarial Attacks with Pixel Deflection
,
2018,
2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[11]
Samy Bengio,et al.
Adversarial Machine Learning at Scale
,
2016,
ICLR.
[12]
Ananthram Swami,et al.
Distillation as a Defense to Adversarial Perturbations Against Deep Neural Networks
,
2015,
2016 IEEE Symposium on Security and Privacy (SP).
[13]
Jian Sun,et al.
Deep Residual Learning for Image Recognition
,
2015,
2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).