Attack on Practical Speaker Verification System Using Universal Adversarial Perturbations

In authentication scenarios, applications of practical speaker verification systems usually require a person to read a dynamic authentication text. Previous studies played an audio adversarial example as a digital signal to perform physical attacks, which would be easily rejected by audio replay detection modules. This work shows that by playing our crafted adversarial perturbation as a separate source when the adversary is speaking, the practical speaker verification system will misjudge the adversary as a target speaker. A two-step algorithm is proposed to optimize the universal adversarial perturbation to be text-independent and has little effect on the authentication text recognition. We also estimated room impulse response (RIR) in the algorithm which allowed the perturbation to be effective after being played over the air. In the physical experiment, we achieved targeted attacks with success rate of 100%, while the word error rate (WER) on speech recognition was only increased by 3.55%. And recorded audios could pass replay detection for the live person speaking.

[1]  Sanjeev Khudanpur,et al.  Librispeech: An ASR corpus based on public domain audio books , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  Jianwei Yu,et al.  Adversarial Attacks on GMM I-Vector Based Speaker Verification Systems , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[3]  Joon Son Chung,et al.  In defence of metric learning for speaker recognition , 2020, INTERSPEECH.

[4]  Bo Yuan,et al.  Real-Time, Universal, and Robust Adversarial Attacks Against Speaker Recognition Systems , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5]  Douglas A. Reynolds,et al.  An overview of automatic speaker recognition technology , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  Joon Son Chung,et al.  Voxceleb: Large-scale speaker verification in the wild , 2020, Comput. Speech Lang..

[7]  Yang Liu,et al.  Who is Real Bob? Adversarial Attacks on Speaker Recognition Systems , 2019, ArXiv.

[8]  Jozef Magdolen,et al.  Voice control of smart home by using Google Cloud Speech-To-Text API , 2018 .

[9]  Wen Gao,et al.  Universal Adversarial Perturbations Generative Network For Speaker Recognition , 2020, 2020 IEEE International Conference on Multimedia and Expo (ICME).

[10]  Ladislav Mošner,et al.  Building and Evaluation of a Real Room Impulse Response Dataset , 2018, IEEE Journal of Selected Topics in Signal Processing.

[11]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[12]  Sanjeev Khudanpur,et al.  X-Vectors: Robust DNN Embeddings for Speaker Recognition , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[13]  Guy-Bart Stan,et al.  Comparison of different impulse response measurement techniques , 2002 .

[14]  Thomas Fang Zheng,et al.  Replay detection using CQT-based modified group delay feature and ResNeWt network in ASVspoof 2019 , 2019, 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC).

[15]  Haibin Wu,et al.  Adversarial Attacks on Spoofing Countermeasures of Automatic Speaker Verification , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).

[16]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[17]  Bo Yuan,et al.  Practical Adversarial Attacks Against Speaker Recognition Systems , 2020, HotMobile.

[18]  Joon Son Chung,et al.  Utterance-level Aggregation for Speaker Recognition in the Wild , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[19]  Wen Gao,et al.  Learning to Fool the Speaker Recognition , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[20]  Sanjeev Khudanpur,et al.  A study on data augmentation of reverberant speech for robust speech recognition , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).