Adversarial Defense for Deep Speaker Recognition Using Hybrid Adversarial Training

Deep neural network based speaker recognition systems can easily be deceived by an adversary using minuscule imperceptible perturbations to the input speech samples. These adversarial attacks pose serious security threats to the speaker recognition systems that use speech biometric. To address this concern, in this work, we propose a new defense mechanism based on a hybrid adversarial training (HAT) setup. In contrast to existing works on countermeasures against adversarial attacks in deep speaker recognition that only use class-boundary information by supervised cross-entropy (CE) loss, we propose to exploit additional information from supervised and unsupervised cues to craft diverse and stronger perturbations for adversarial training. Specifically, we employ multi-task objectives using CE, feature-scattering (FS), and margin losses to create adversarial perturbations and include them for adversarial training to enhance the robustness of the model. We conduct speaker recognition experiments on the Librispeech dataset, and compare the performance with state-of-the-art projected gradient descent (PGD)-based adversarial training which employs only CE objective. The proposed HAT improves adversarial accuracy by absolute 3.29% and 3.18% for PGD and Carlini-Wagner (CW) attacks respectively, while retaining high accuracy on benign examples.

[1]  John H. L. Hansen,et al.  Adversarial Regularization for End-to-End Robust Speaker Verification , 2019, INTERSPEECH.

[2]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[3]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[4]  David A. Wagner,et al.  Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples , 2018, ICML.

[5]  Jiliang Tang,et al.  Adversarial Attacks and Defenses in Images, Graphs and Text: A Review , 2019, International Journal of Automation and Computing.

[6]  Shrikanth Narayanan,et al.  Multi-Task Discriminative Training of Hybrid DNN-TVM Model for Speaker Verification with Noisy and Far-Field Speech , 2019, INTERSPEECH.

[7]  Marco Cuturi,et al.  Sinkhorn Distances: Lightspeed Computation of Optimal Transport , 2013, NIPS.

[8]  Moustapha Cissé,et al.  Fooling End-To-End Speaker Verification With Adversarial Examples , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9]  Wen Gao,et al.  Universal Adversarial Perturbations Generative Network For Speaker Recognition , 2020, 2020 IEEE International Conference on Multimedia and Expo (ICME).

[10]  Haizhou Li,et al.  An overview of text-independent speaker recognition: From features to supervectors , 2010, Speech Commun..

[11]  Sebastian Ruder,et al.  An overview of gradient descent optimization algorithms , 2016, Vestnik komp'iuternykh i informatsionnykh tekhnologii.

[12]  Samy Bengio,et al.  Adversarial Machine Learning at Scale , 2016, ICLR.

[13]  Colin Raffel,et al.  Imperceptible, Robust, and Targeted Adversarial Examples for Automatic Speech Recognition , 2019, ICML.

[14]  Ling Shao,et al.  Deeply Supervised Discriminative Learning for Adversarial Defense , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Jianwei Yu,et al.  Adversarial Attacks on GMM I-Vector Based Speaker Verification Systems , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[16]  Aleksander Madry,et al.  On Evaluating Adversarial Robustness , 2019, ArXiv.

[17]  Dorothea Kolossa,et al.  Adversarial Attacks Against Automatic Speech Recognition Systems via Psychoacoustic Hiding , 2018, NDSS.

[18]  Haichao Zhang,et al.  Defense Against Adversarial Attacks Using Feature Scattering-based Adversarial Training , 2019, NeurIPS.

[19]  Yang Liu,et al.  Who is Real Bob? Adversarial Attacks on Speaker Recognition Systems , 2019, ArXiv.

[20]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[21]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[22]  Goutam Saha,et al.  On robustness of speech based biometric systems against voice conversion attack , 2015, Appl. Soft Comput..

[23]  Sanjeev Khudanpur,et al.  Librispeech: An ASR corpus based on public domain audio books , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[24]  David A. Wagner,et al.  Audio Adversarial Examples: Targeted Attacks on Speech-to-Text , 2018, 2018 IEEE Security and Privacy Workshops (SPW).

[25]  Shrikanth Narayanan,et al.  Adversarial Attack and Defense Strategies for Deep Speaker Recognition Systems , 2021, Comput. Speech Lang..

[26]  Seunghoon Hong,et al.  Adversarial Defense via Learning to Generate Diverse Attacks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[27]  Haibin Wu,et al.  Defense Against Adversarial Attacks on Spoofing Countermeasures of ASV , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[28]  Pan He,et al.  Adversarial Examples: Attacks and Defenses for Deep Learning , 2017, IEEE Transactions on Neural Networks and Learning Systems.