Adversarial Separation Network for Speaker Recognition

Deep neural networks (DNN) have achieved great success in speaker recognition systems. However, it is observed that DNN based systems are easily deceived by adversarial examples leading to wrong predictions. Adversarial examples, which are generated by adding purposeful perturbations on natural examples, pose a serious security threat. In this study, we propose the adversarial separation network (AS-Net) to protect the speaker recognition system against adversarial attacks. Our proposed AS-Net is featured by its ability to separate adversarial perturbation from the test speech to restore the natural clean speech. As a standalone component, each input speech is pre-processed by AS-Net first. Furthermore, we incorporate the compression structure and the speaker quality loss to enhance the capacity of the AS-Net. Experimental results on the VCTK dataset demonstrated that the AS-Net effectively enhanced the robustness of speaker recognition systems against adversarial examples. It also significantly outperformed other state-of-the-art adversarial-detection mechanisms, including adversarial perturbation elimination network (APE-GAN), feature squeezing, and adversarial training.

[1]  Dan Boneh,et al.  Ensemble Adversarial Training: Attacks and Defenses , 2017, ICLR.

[2]  Enhua Wu,et al.  Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Haibin Wu,et al.  Adversarial Attacks on Spoofing Countermeasures of Automatic Speaker Verification , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).

[4]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  David A. Wagner,et al.  Audio Adversarial Examples: Targeted Attacks on Speech-to-Text , 2018, 2018 IEEE Security and Privacy Workshops (SPW).

[6]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[7]  John H. L. Hansen,et al.  Adversarial Regularization for End-to-End Robust Speaker Verification , 2019, INTERSPEECH.

[8]  Galina Lavrentyeva,et al.  Audio Replay Attack Detection with Deep Learning Frameworks , 2017, INTERSPEECH.

[9]  Jun Zhu,et al.  Boosting Adversarial Attacks with Momentum , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[10]  Pan He,et al.  Adversarial Examples: Attacks and Defenses for Deep Learning , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[11]  David A. Forsyth,et al.  SafetyNet: Detecting and Rejecting Adversarial Examples Robustly , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[12]  Yongdong Zhang,et al.  APE-GAN: Adversarial Perturbation Elimination with GAN , 2017, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[13]  Galina Lavrentyeva,et al.  STC Antispoofing Systems for the ASVspoof2019 Challenge , 2019, INTERSPEECH.

[14]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[15]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[16]  Nicholas W. D. Evans,et al.  Spoofing countermeasures for the protection of automatic speaker recognition systems against attacks with artificial signals , 2012, INTERSPEECH.

[17]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[18]  Yanjun Qi,et al.  Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks , 2017, NDSS.

[19]  Judith C. Brown Calculation of a constant Q spectral transform , 1991 .

[20]  Michael P. Wellman,et al.  Towards the Science of Security and Privacy in Machine Learning , 2016, ArXiv.

[21]  Ananthram Swami,et al.  Distillation as a Defense to Adversarial Perturbations Against Deep Neural Networks , 2015, 2016 IEEE Symposium on Security and Privacy (SP).

[22]  Luiz Eduardo Soares de Oliveira,et al.  Decoupling Direction and Norm for Efficient Gradient-Based L2 Adversarial Attacks and Defenses , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Luyu Wang,et al.  advertorch v0.1: An Adversarial Robustness Toolbox based on PyTorch , 2019, ArXiv.

[24]  Junichi Yamagishi,et al.  SUPERSEDED - CSTR VCTK Corpus: English Multi-speaker Corpus for CSTR Voice Cloning Toolkit , 2016 .

[25]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[26]  Zhuowen Tu,et al.  Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).