论文信息 - An Investigation of Deep-Learning Frameworks for Speaker Verification Antispoofing

An Investigation of Deep-Learning Frameworks for Speaker Verification Antispoofing

In this study, we explore the use of deep-learning approaches for spoofing detection in speaker verification. Most spoofing detection systems that have achieved recent success employ hand-craft features with specific spoofing prior knowledge, which may limit the feasibility to unseen spoofing attacks. We aim to investigate the genuine-spoofing discriminative ability from the back-end stage, utilizing recent advancements in deep-learning research. In this paper, alternative network architectures are exploited to target spoofed speech. Based on this analysis, a novel spoofing detection system, which simultaneously employs convolutional neural networks (CNNs) and recurrent neural networks (RNNs) is proposed. In this framework, CNN is treated as a convolutional feature extractor applied on the speech input. On top of the CNN processed output, recurrent networks are employed to capture long-term dependencies across the time domain. Novel features including Teager energy operator critical band autocorrelation envelope, perceptual minimum variance distortionless response, and a more general spectrogram are also investigated as inputs to our proposed deep-learning frameworks. Experiments using the ASVspoof 2015 Corpus show that the integrated CNN–RNN framework achieves state-of-the-art single-system performance. The addition of score-level fusion further improves system robustness. A detailed analysis shows that our proposed approach can potentially compensate for the issue due to short duration test utterances, which is also an issue in the evaluation corpus.

[1] John H. L. Hansen,et al. I-vector based physical task stress detection with different fusion strategies , 2015, INTERSPEECH.

[2] G. Montavon. Deep learning for spoken language identification , 2009 .

[3] Tomoki Toda,et al. SAS: A speaker verification spoofing database containing diverse attacks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[4] Gerald Penn,et al. Convolutional Neural Networks for Speech Recognition , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[5] Nicholas W. D. Evans,et al. A one-class classification approach to generalised speaker verification spoofing countermeasures using local binary patterns , 2013, 2013 IEEE Sixth International Conference on Biometrics: Theory, Applications and Systems (BTAS).

[6] Matti Pietikäinen,et al. Face Description with Local Binary Patterns: Application to Face Recognition , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7] Yoshua Bengio,et al. Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[8] John H. L. Hansen,et al. Nonlinear feature based classification of speech under stress , 2001, IEEE Trans. Speech Audio Process..

[9] Themos Stafylakis,et al. Development of CRIM system for the automatic speaker verification spoofing and countermeasures challenge 2015 , 2015, INTERSPEECH.

[10] Eduardo Lleida,et al. Spoofing detection with DNN and one-class SVM for the ASVspoof 2015 challenge , 2015, INTERSPEECH.

[11] Yoshua Bengio,et al. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[12] Tomi Kinnunen,et al. I-vectors meet imitators: on vulnerability of speaker verification systems against voice mimicry , 2013, INTERSPEECH.

[13] Ibon Saratxaga,et al. Evaluation of Speaker Verification Security and Detection of HMM-Based Synthetic Speech , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[14] Eduardo Lleida,et al. Preventing replay attacks on speaker verification systems , 2011, 2011 Carnahan Conference on Security Technology.

[15] Haizhou Li,et al. A study on spoofing attack in state-of-the-art speaker verification: the telephone speech case , 2012, Proceedings of The 2012 Asia Pacific Signal and Information Processing Association Annual Summit and Conference.

[16] John H. L. Hansen,et al. Duration mismatch compensation for i-vector based speaker recognition systems , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[17] Bo Chen,et al. Robust deep feature for spoofing detection - the SJTU system for ASVspoof 2015 challenge , 2015, INTERSPEECH.

[18] Georg Heigold,et al. End-to-end text-dependent speaker verification , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[19] Sébastien Marcel,et al. On the effectiveness of local binary patterns in face anti-spoofing , 2012, 2012 BIOSIG - Proceedings of the International Conference of Biometrics Special Interest Group (BIOSIG).

[20] John H. L. Hansen,et al. An experimental study of speaker verification sensitivity to computer voice-altered imposters , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[21] Nicholas W. D. Evans,et al. Spoofing countermeasures for the protection of automatic speaker recognition systems against attacks with artificial signals , 2012, INTERSPEECH.

[22] David Menotti,et al. Deep Representations for Iris, Face, and Fingerprint Spoofing Detection , 2014, IEEE Transactions on Information Forensics and Security.

[23] Haizhou Li,et al. A study on replay attack and anti-spoofing for text-dependent speaker verification , 2014, Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific.

[24] Tara N. Sainath,et al. FUNDAMENTAL TECHNOLOGIES IN MODERN SPEECH RECOGNITION Digital Object Identifier 10.1109/MSP.2012.2205597 , 2012 .

[25] Florin Curelaru,et al. Front-End Factor Analysis For Speaker Verification , 2018, 2018 International Conference on Communications (COMM).

[26] Haizhou Li,et al. Spoofing and countermeasures for speaker verification: A survey , 2015, Speech Commun..

[27] Aleksandr Sizov,et al. Introducing i-vectors for joint anti-spoofing and speaker verification , 2014, INTERSPEECH.

[28] Haizhou Li,et al. Synthetic speech detection using temporal modulation feature , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[29] Liu Gang,et al. Joint information from nonlinear and linear features for spoofing detection: An i-vector/DNN based approach , 2016 .

[30] Nicholas W. D. Evans,et al. A New Feature for Automatic Speaker Verification Anti-Spoofing: Constant Q Cepstral Coefficients , 2016, Odyssey.

[31] John Salvatier,et al. Theano: A Python framework for fast computation of mathematical expressions , 2016, ArXiv.

[32] Junichi Yamagishi,et al. ASVspoof 2021: Automatic Speaker Verification Spoofing and Countermeasures Challenge Evaluation Plan , 2021, ArXiv.

[33] Haizhou Li,et al. Spoofing speech detection using high dimensional magnitude and phase features: the NTU approach for ASVspoof 2015 challenge , 2015, INTERSPEECH.

[34] Zhizheng Wu,et al. Human vs machine spoofing detection on wideband and narrowband data , 2015, INTERSPEECH.

[35] Themos Stafylakis,et al. PLDA for speaker verification with utterances of arbitrary duration , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[36] John H. L. Hansen,et al. A new perceptually motivated MVDR-based acoustic front-end (PMVDR) for robust automatic speech recognition , 2008, Speech Commun..

[37] Aleksandr Sizov,et al. Classifiers for synthetic speech detection: a comparison , 2015, INTERSPEECH.

[38] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[39] Sébastien Marcel,et al. On the vulnerability of speaker verification to realistic voice spoofing , 2015, 2015 IEEE 7th International Conference on Biometrics Theory, Applications and Systems (BTAS).

[40] Yun Lei,et al. A novel scheme for speaker recognition using a phonetically-aware deep neural network , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[41] Haizhou Li,et al. Detecting Converted Speech and Natural Speech for anti-Spoofing Attack in Speaker Recognition , 2012, INTERSPEECH.

[42] Sébastien Marcel,et al. Presentation Attack Detection Using Long-Term Spectral Statistics for Trustworthy Speaker Verification , 2016, 2016 International Conference of the Biometrics Special Interest Group (BIOSIG).

[43] Stan Z. Li,et al. Learn Convolutional Neural Network for Face Anti-Spoofing , 2014, ArXiv.

[44] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[45] Geoffrey E. Hinton,et al. Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[46] Hemant A. Patil,et al. Combining evidences from mel cepstral, cochlear filter cepstral and instantaneous frequency features for detection of natural vs. spoofed speech , 2015, INTERSPEECH.

[47] John H. L. Hansen,et al. An Investigation into Back-end Advancements for Speaker Recognition in Multi-Session and Noisy Enrollment Scenarios , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[48] John H. L. Hansen,et al. Speaker Recognition by Machines and Humans: A tutorial review , 2015, IEEE Signal Processing Magazine.

[49] Nicholas W. D. Evans,et al. Re-assessing the threat of replay spoofing attacks against automatic speaker verification , 2014, 2014 International Conference of the Biometrics Special Interest Group (BIOSIG).

[50] Tomi Kinnunen,et al. A comparison of features for synthetic speech detection , 2015, INTERSPEECH.

[51] Aleksandr Sizov,et al. ASVspoof 2015: the first automatic speaker verification spoofing and countermeasures challenge , 2015, INTERSPEECH.