ResNet and Model Fusion for Automatic Spoofing Detection

Speaker verification systems have achieved great progress in recent years. Unfortunately, they are still highly prone to different kinds of spoofing attacks such as speech synthesis, voice conversion, and fake audio recordings etc. Inspired by the success of ResNet in image recognition, we investigated the effectiveness of using ResNet for automatic spoofing detection. Experimental results on the ASVspoof2017 data set show that ResNet performs the best among all the single-model systems. Model fusion is a good way to further improve the system performance. Nevertheless, we found that if the same feature is used for different fused models, the resulting system can hardly be improved. By using different features and models, our best fused model further reduced the Equal Error Rate (EER) by 18% relatively, compared with the best single-model system.

[1]  Jürgen Schmidhuber,et al.  Highway Networks , 2015, ArXiv.

[2]  Eduardo Lleida,et al.  Preventing replay attacks on speaker verification systems , 2011, 2011 Carnahan Conference on Security Technology.

[3]  B. Solaiman,et al.  Information fusion, application to data and model fusion for ultrasound image segmentation , 1999, IEEE Transactions on Biomedical Engineering.

[4]  DeLiang Wang,et al.  Deep neural networks for cochannel speaker identification , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5]  Salim Chitroub,et al.  Classifier combination and score level fusion: concepts and practical aspects , 2010 .

[6]  Vidhyasaharan Sethu,et al.  Investigation of Sub-Band Discriminative Information Between Spoofed and Genuine Speech , 2016, INTERSPEECH.

[7]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[8]  Jr. J.P. Campbell,et al.  Speaker recognition: a tutorial , 1997, Proc. IEEE.

[9]  Nicholas W. D. Evans,et al.  Constant Q cepstral coefficients: A spoofing countermeasure for automatic speaker verification , 2017, Comput. Speech Lang..

[10]  Sriram Ganapathy,et al.  Factor analysis methods for joint speaker verification and spoof detection , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[11]  Kong-Aik Lee,et al.  The ASVspoof 2017 Challenge: Assessing the Limits of Replay Spoofing Attack Detection , 2017, INTERSPEECH.

[12]  Richa Singh,et al.  Improving Iris Recognition Performance Using Segmentation, Quality Enhancement, Match Score Fusion, and Indexing , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[13]  Artur Janicki Spoofing countermeasure based on analysis of linear prediction error , 2015, INTERSPEECH.

[14]  Hagai Aronowitz,et al.  Voice transformation-based spoofing of text-dependent speaker verification systems , 2013, INTERSPEECH.

[15]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[16]  Patrick Kenny,et al.  Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[17]  Longbiao Wang,et al.  Relative phase information for detecting human speech and spoofed speech , 2015, INTERSPEECH.

[18]  Tomi Kinnunen,et al.  I-vectors meet imitators: on vulnerability of speaker verification systems against voice mimicry , 2013, INTERSPEECH.

[19]  Ibon Saratxaga,et al.  Evaluation of Speaker Verification Security and Detection of HMM-Based Synthetic Speech , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[20]  Nicholas W. D. Evans,et al.  Spoofing countermeasures to protect automatic speaker verification from voice conversion , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[21]  Patrick Kenny,et al.  Joint Factor Analysis Versus Eigenchannels in Speaker Recognition , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[22]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Themos Stafylakis,et al.  Development of CRIM system for the automatic speaker verification spoofing and countermeasures challenge 2015 , 2015, INTERSPEECH.

[24]  Eduardo Lleida,et al.  Spoofing detection with DNN and one-class SVM for the ASVspoof 2015 challenge , 2015, INTERSPEECH.

[25]  Haizhou Li,et al.  An overview of text-independent speaker recognition: From features to supervectors , 2010, Speech Commun..

[26]  Haizhou Li,et al.  Spoofing and countermeasures for speaker verification: A survey , 2015, Speech Commun..