Adversarial Examples for Automatic Speech Recognition: Attacks and Countermeasures

Speech is a common and effective approach for communication between humans and modern mobile devices such as smartphones or home hubs. The remarkable advances in computing and networking have popularized automatic speech recognition (ASR) systems, which can interpret received speech signals on mobile devices and enable us to remotely control and interact with those devices. Despite promising development, audio adversarial examples, a new kind of attack on advanced ASR systems, are found to be extremely effective in imitating human speech while fooling mobile devices to produce incorrect commands. In this article, we provide a systematic survey of audio adversarial examples in the literature. We first present an overview of the architecture of ASR systems and outline the basic attack philosophy. Followed by a brief introduction of the state-of-the-art solutions to audio adversarial examples, a comprehensive comparison is presented. Finally, after discussing existing countermeasures to defend ASR, we highlight several promising future research directions and challenges on constructing more robust and practical audio adversarial examples.

[1]  Moustapha Cissé,et al.  Houdini: Fooling Deep Structured Prediction Models , 2017, ArXiv.

[2]  Wenyuan Xu,et al.  DolphinAttack: Inaudible Voice Commands , 2017, CCS.

[3]  Mei-Yuh Hwang,et al.  Training Augmentation with Adversarial Examples for Robust Speech Recognition , 2018, INTERSPEECH.

[4]  Yue Zhao,et al.  CommanderSong: A Systematic Approach for Practical Adversarial Voice Recognition , 2018, USENIX Security Symposium.

[5]  Dimitrios Koutsonikolas,et al.  Dolphin: Real-Time Hidden Acoustic Signal Capture with Smartphones , 2019, IEEE Transactions on Mobile Computing.

[6]  Christian Poellabauer,et al.  Crafting Adversarial Examples For Speech Paralinguistics Applications , 2017, ArXiv.

[7]  Bob L. Sturm,et al.  Deep Learning and Music Adversaries , 2015, IEEE Transactions on Multimedia.

[8]  Qingquan Li,et al.  Robust Gait Recognition by Integrating Inertial and RGBD Sensors , 2016, IEEE Transactions on Cybernetics.

[9]  Mani B. Srivastava,et al.  Did you hear that? Adversarial Examples Against Automatic Speech Recognition , 2018, ArXiv.

[10]  David Wagner,et al.  Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods , 2017, AISec@CCS.

[11]  Dan Iter,et al.  Generating Adversarial Examples for Speech Recognition , 2017 .

[12]  Li Chen,et al.  ADAGIO: Interactive Experimentation with Adversarial Attack and Defense for Audio , 2018, ECML/PKDD.

[13]  Moustapha Cissé,et al.  Fooling End-To-End Speaker Verification With Adversarial Examples , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[14]  David A. Wagner,et al.  Audio Adversarial Examples: Targeted Attacks on Speech-to-Text , 2018, 2018 IEEE Security and Privacy Workshops (SPW).

[15]  Nikita Vemuri,et al.  Targeted Adversarial Examples for Black Box Audio Systems , 2018, 2019 IEEE Security and Privacy Workshops (SPW).