On the Generalization of Fused Systems in Voice Presentation Attack Detection

This paper describes presentation attack detection systems developed for the Automatic Speaker Verification Spoofing and Countermeasures Challenge (ASVspoof 2017). The submitted systems, using calibration and score fusion techniques, combine different sub-systems (up to 18), which are based on eight state of the art features and rely on Gaussian mixture models and feed-forward neural network classifiers. The systems achieved the top five performances in the competition. We present the proposed systems and analyze the calibration and fusion strategies employed. To assess the systems' generalization capacity, we evaluated it on an unrelated larger database recorded in Portuguese language, which is different from the English language used in the competition. These extended evaluation results show that the fusion-based system, although successful in the scope of the evaluation, lacks the ability to accurately discriminate genuine data from attacks in unknown conditions, which raises the question on how to assess the generalization ability of attack detection systems in practical application scenarios.

[1]  Sébastien Marcel,et al.  Score calibration in face recognition , 2014, IET Biom..

[2]  Richard M. Stern,et al.  Power-Normalized Cepstral Coefficients (PNCC) for Robust Speech Recognition , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[3]  Goutam Saha,et al.  Improved Closed Set Text-Independent Speaker Identification by Combining MFCC with Evidence from Flipped Filter Banks , 2008 .

[4]  Nicholas W. D. Evans,et al.  A New Feature for Automatic Speaker Verification Anti-Spoofing: Constant Q Cepstral Coefficients , 2016, Odyssey.

[5]  Tomi Kinnunen,et al.  A comparison of features for synthetic speech detection , 2015, INTERSPEECH.

[6]  Kong-Aik Lee,et al.  The ASVspoof 2017 Challenge: Assessing the Limits of Replay Spoofing Attack Detection , 2017, INTERSPEECH.

[7]  Goutam Saha,et al.  Generalization of spoofing countermeasures: A case study with ASVspoof 2015 and BTAS 2016 corpora , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8]  Sébastien Marcel,et al.  Impact of Score Fusion on Voice Biometrics and Presentation Attack Detection in Cross-Database Evaluations , 2017, IEEE Journal of Selected Topics in Signal Processing.

[9]  Nicholas W. D. Evans,et al.  Constant Q cepstral coefficients: A spoofing countermeasure for automatic speaker verification , 2017, Comput. Speech Lang..

[10]  Francesc Alías,et al.  Gammatone Cepstral Coefficients: Biologically Inspired Features for Non-Speech Audio Classification , 2012, IEEE Transactions on Multimedia.

[11]  Bin Ma,et al.  The reddots data collection for speaker recognition , 2015, INTERSPEECH.

[12]  Sébastien Marcel,et al.  Cross-Database Evaluation of Audio-Based Spoofing Detection Systems , 2016, INTERSPEECH.

[13]  Kong-Aik Lee,et al.  RedDots replayed: A new replay spoofing attack corpus for text-dependent speaker verification research , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[14]  Sébastien Marcel,et al.  Presentation Attack Detection Using Long-Term Spectral Statistics for Trustworthy Speaker Verification , 2016, 2016 International Conference of the Biometrics Special Interest Group (BIOSIG).