Abstract Deep neural networks (DNNs) display good performance in the domains of recognition and prediction, such as on tasks of image recognition, speech recognition, video recognition, and pattern analysis. However, adversarial examples, created by inserting a small amount of noise into the original samples, can be a serious threat because they can cause misclassification by the DNN. Adversarial examples have been studied primarily in the context of images, but their effect in the audio context is now drawing considerable interest as well. For example, by adding a small distortion to an original audio sample, imperceptible to humans, an audio adversarial example can be created that humans hear as error-free but that causes misunderstanding by a machine. Therefore, it is necessary to create a method of defense for resisting audio adversarial examples. In this paper, we propose an acoustic-decoy method for detecting audio adversarial examples. Its key feature is that it adds well-formalized distortions using audio modification that are sufficient to change the classification result of an adversarial example but do not affect the classification result of an original sample. Experimental results show that the proposed scheme can detect adversarial examples by reducing the similarity rate for an adversarial example to 6.21%, 1.27%, and 0.66% using low-pass filtering (with 12 dB roll-off), 8-bit reduction, and audio silence removal techniques, respectively. It can detect an audio adversarial example with a success rate of 97% by performing a comparison with the initial audio sample.
[1]
Micah Sherr,et al.
Cocaine Noodles: Exploiting the Gap between Human and Machine Speech Recognition
,
2015,
WOOT.
[2]
Benjamin Kedem,et al.
Authors' Reply to Comments on 'Zero-crossing rates of functions of Gaussian processes'
,
1991,
IEEE Trans. Inf. Theory.
[3]
S. Eddy.
Hidden Markov models.
,
1996,
Current opinion in structural biology.
[4]
Jürgen Schmidhuber,et al.
Deep learning in neural networks: An overview
,
2014,
Neural Networks.
[5]
Alireza R. Bakhshai,et al.
An adaptive notch filter for frequency estimation of a periodic signal
,
2004,
IEEE Transactions on Automatic Control.
[6]
Jimmy Ba,et al.
Adam: A Method for Stochastic Optimization
,
2014,
ICLR.
[7]
Sridha Sridharan,et al.
Temporarily-Aware Context Modeling Using Generative Adversarial Networks for Speech Activity Detection
,
2020,
IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[8]
Jun Guo,et al.
Spoofing Detection in Automatic Speaker Verification Systems Using DNN Classifiers and Dynamic Acoustic Features
,
2018,
IEEE Transactions on Neural Networks and Learning Systems.