论文信息 - Performance evaluation of front- and back-end techniques for ASV spoofing detection systems based on deep features

Performance evaluation of front- and back-end techniques for ASV spoofing detection systems based on deep features

As Automatic Speaker Verification (ASV) becomes more popular, so do the ways impostors can use to gain illegal access to speech-based biometric systems. For instance, impostors can use Text-to-Speech (TTS) and Voice Conversion (VC) techniques to generate speech acoustics resembling the voice of a genuine user and, hence, gain fraudulent access to the system. To prevent this, a number of anti-spoofing countermeasures have been developed for detecting these high technology attacks. However, the detection of previously unforeseen spoofing attacks remains challenging. To address this issue, in this work we perform an extensive empirical investigation on the speech features and back-end classifiers providing the best overall performance for an antispoofing system based on a deep learning framework. In this architecture, a deep neural network is used to extract a single identity spoofing vector per utterance from the speech features. Then, the extracted vectors are passed to a classifier in order to make the final detection decision. Experimental evaluation is carried out on the standard ASVSpoof2015 data corpus. The results show that classical FBANK features and Linear Discriminant Analysis (LDA) obtain the best performance for the proposed system.

Ángel M. Gómez | Antonio M. Peinado | Alejandro Gómez Alanís | José Andrés González López

[1] Jan Cernocký,et al. Probabilistic and Bottle-Neck Features for LVCSR of Meetings , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[2] Niko Brümmer,et al. The BOSARIS Toolkit: Theory, Algorithms and Code for Surviving the New DCF , 2013, ArXiv.

[3] Ya Zhang,et al. Deep feature for text-dependent speaker verification , 2015, Speech Commun..

[4] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[5] E. Owens,et al. An Introduction to the Psychology of Hearing , 1997 .

[6] Zhizheng Wu,et al. Deep Feature Engineering for Noise Robust Spoofing Detection , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[7] Aleksandr Sizov,et al. ASVspoof 2015: the first automatic speaker verification spoofing and countermeasures challenge , 2015, INTERSPEECH.

[8] Ángel M. Gómez,et al. A Deep Identity Representation for Noise Robust Spoofing Detection , 2018, INTERSPEECH.

[9] Kai Yu,et al. Deep features for automatic spoofing detection , 2016, Speech Communication.

[10] John H. L. Hansen,et al. An Investigation of Deep-Learning Frameworks for Speaker Verification Antispoofing , 2017, IEEE Journal of Selected Topics in Signal Processing.

[11] Tomoki Toda,et al. Anti-Spoofing for Text-Independent Speaker Verification: An Initial Database, Comparison of Countermeasures, and Human Performance , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[12] Sébastien Marcel,et al. Long-Term Spectral Statistics for Voice Presentation Attack Detection , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[13] Nicholas W. D. Evans,et al. A New Feature for Automatic Speaker Verification Anti-Spoofing: Constant Q Cepstral Coefficients , 2016, Odyssey.

[14] Hemant A. Patil,et al. Combining evidences from mel cepstral, cochlear filter cepstral and instantaneous frequency features for detection of natural vs. spoofed speech , 2015, INTERSPEECH.

[15] Eduardo Lleida,et al. Spoofing detection with DNN and one-class SVM for the ASVspoof 2015 challenge , 2015, INTERSPEECH.

[16] Bernhard Schölkopf,et al. Support Vector Method for Novelty Detection , 1999, NIPS.

[17] Zhizheng Wu,et al. Improving Trajectory Modelling for DNN-based Speech Synthesis by using Stacked Bottleneck Features and Minimum Trajectory Error Training , 2016, ArXiv.

[18] Vaibhava Goel,et al. Annealed dropout training of deep networks , 2014, 2014 IEEE Spoken Language Technology Workshop (SLT).

[19] Haizhou Li,et al. Spoofing and countermeasures for speaker verification: A survey , 2015, Speech Commun..

[20] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.