Inaudible Adversarial Perturbations for Targeted Attack in Speaker Recognition

Speaker recognition is a popular topic in biometric authentication and many deep learning approaches have achieved extraordinary performances. However, it has been shown in both image and speech applications that deep neural networks are vulnerable to adversarial examples. In this study, we aim to exploit this weakness to perform targeted adversarial attacks against the x-vector based speaker recognition system. We propose to generate inaudible adversarial perturbations achieving targeted white-box attacks to speaker recognition system based on the psychoacoustic principle of frequency masking. Specifically, we constrict the perturbation under the masking threshold of original audio, instead of using a common l_p norm to measure the perturbations. Experiments on Aishell-1 corpus show that our approach yields up to 98.5% attack success rate to arbitrary gender speaker targets, while retaining indistinguishable attribute to listeners. Furthermore, we also achieve an effective speaker attack when applying the proposed approach to a completely irrelevant waveform, such as music.

[1]  Bo Yuan,et al.  Real-Time, Universal, and Robust Adversarial Attacks Against Speaker Recognition Systems , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  Wen Gao,et al.  Universal Adversarial Perturbations Generative Network For Speaker Recognition , 2020, 2020 IEEE International Conference on Multimedia and Expo (ICME).

[3]  Shin Ishii,et al.  Virtual Adversarial Training: A Regularization Method for Supervised and Semi-Supervised Learning , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Dorothea Kolossa,et al.  Adversarial Attacks Against Automatic Speech Recognition Systems via Psychoacoustic Hiding , 2018, NDSS.

[5]  Tomoki Toda,et al.  Anti-Spoofing for Text-Independent Speaker Verification: An Initial Database, Comparison of Countermeasures, and Human Performance , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[6]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[7]  Hao Zheng,et al.  AISHELL-1: An open-source Mandarin speech corpus and a speech recognition baseline , 2017, 2017 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment (O-COCOSDA).

[8]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[9]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[10]  Shen Li,et al.  Adversarial Examples for Improving End-to-end Attention-based Small-footprint Keyword Spotting , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[11]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[12]  Samy Bengio,et al.  Adversarial examples in the physical world , 2016, ICLR.

[13]  Mei-Yuh Hwang,et al.  Adversarial Regularization for Attention Based End-to-End Robust Speech Recognition , 2019, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[14]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .

[15]  John H. L. Hansen,et al.  Adversarial Regularization for End-to-End Robust Speaker Verification , 2019, INTERSPEECH.

[16]  Daniel Povey,et al.  MUSAN: A Music, Speech, and Noise Corpus , 2015, ArXiv.

[17]  Colin Raffel,et al.  Imperceptible, Robust, and Targeted Adversarial Examples for Automatic Speech Recognition , 2019, ICML.

[18]  Lei Xie,et al.  Virtual Adversarial Training for DS-CNN Based Small-Footprint Keyword Spotting , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).

[19]  Haizhou Li,et al.  Unsupervised Domain Adaptation via Domain Adversarial Training for Speaker Recognition , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[20]  Andries P. Hekstra,et al.  Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[21]  Haizhou Li,et al.  Spoofing and countermeasures for speaker verification: A survey , 2015, Speech Commun..

[22]  Haizhou Li,et al.  The Attacker's Perspective on Automatic Speaker Verification: An Overview , 2020, INTERSPEECH.

[23]  Moustapha Cissé,et al.  Fooling End-To-End Speaker Verification With Adversarial Examples , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[24]  David A. Wagner,et al.  Audio Adversarial Examples: Targeted Attacks on Speech-to-Text , 2018, 2018 IEEE Security and Privacy Workshops (SPW).

[25]  Sanjeev Khudanpur,et al.  X-Vectors: Robust DNN Embeddings for Speaker Recognition , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[26]  Lei Xie,et al.  Domain Adversarial Training for Improving Keyword Spotting Performance of ESL Speech , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[27]  Lei Xie,et al.  Unsupervised Adaptation with Adversarial Dropout Regularization for Robust Speech Recognition , 2019, INTERSPEECH.

[28]  John H. L. Hansen,et al.  Cross-lingual Text-independent Speaker Verification Using Unsupervised Adversarial Discriminative Domain Adaptation , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[29]  Mei-Yuh Hwang,et al.  Training Augmentation with Adversarial Examples for Robust Speech Recognition , 2018, INTERSPEECH.

[30]  Aleksandr Sizov,et al.  ASVspoof 2015: the first automatic speaker verification spoofing and countermeasures challenge , 2015, INTERSPEECH.

[31]  Waleed H. Abdulla,et al.  Principles of Psychoacoustics , 2015 .