Experimental Analysis of Features for Replay Attack Detection - Results on the ASVspoof 2017 Challenge

This paper presents an experimental comparison of different features for the detection of replay spoofing attacks in Automatic Speaker Verification systems. We evaluate the proposed countermeasures using two recently introduced databases, including the dataset provided for the ASVspoof 2017 challenge. This challenge provides researchers with a common framework for the evaluation of replay attack detection systems, with a particular focus on the generalization to new, unknown conditions (for instance, replay devices different from those used during system training). Our cross–database experiments show that, although achieving this level of generalization is indeed a challenging task, it is possible to train classifiers that exhibit stable and consistent results across different experiments. The proposed approach for the ASVspoof 2017 challenge consists in the score–level fusion of several base classifiers using logistic regression. These base classifiers are 2–class Gaussian Mixture Models (GMMs) representing genuine and spoofed speech respectively. Our best system achieves an Equal Error Rate of 10.52% on the challenge evaluation set. As a result of this set of experiments, we provide some general conclusions regarding feature extraction for replay attack detection and identify which features show the most promising results.

[1]  Gang Wei,et al.  Channel pattern noise based playback attack detection algorithm for speaker recognition , 2011, 2011 International Conference on Machine Learning and Cybernetics.

[2]  Nicholas W. D. Evans,et al.  Re-assessing the threat of replay spoofing attacks against automatic speaker verification , 2014, 2014 International Conference of the Biometrics Special Interest Group (BIOSIG).

[3]  Tomi Kinnunen,et al.  A comparison of features for synthetic speech detection , 2015, INTERSPEECH.

[4]  John H. L. Hansen,et al.  UTD-CRSS Systems for 2018 NIST Speaker Recognition Evaluation , 2019, ICASSP.

[5]  John H. L. Hansen,et al.  CRSS systems for 2012 NIST Speaker Recognition Evaluation , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  Goutam Saha,et al.  Overview of BTAS 2016 speaker anti-spoofing competition , 2016, 2016 IEEE 8th International Conference on Biometrics Theory, Applications and Systems (BTAS).

[7]  Nicholas W. D. Evans,et al.  Constant Q cepstral coefficients: A spoofing countermeasure for automatic speaker verification , 2017, Comput. Speech Lang..

[8]  Eduardo Lleida,et al.  Detecting Replay Attacks from Far-Field Recordings on Speaker Verification Systems , 2011, BIOID.

[9]  Sébastien Marcel,et al.  Cross-Database Evaluation of Audio-Based Spoofing Detection Systems , 2016, INTERSPEECH.

[10]  Sridha Sridharan,et al.  Feature warping for robust speaker verification , 2001, Odyssey.

[11]  Kong-Aik Lee,et al.  RedDots replayed: A new replay spoofing attack corpus for text-dependent speaker verification research , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[12]  Goutam Saha,et al.  Improved Closed Set Text-Independent Speaker Identification by Combining MFCC with Evidence from Flipped Filter Banks , 2008 .

[13]  Kuldip K. Paliwal,et al.  Spectral subband centroid features for speech recognition , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[14]  Sébastien Marcel,et al.  On the vulnerability of speaker verification to realistic voice spoofing , 2015, 2015 IEEE 7th International Conference on Biometrics Theory, Applications and Systems (BTAS).

[15]  Malcolm Slaney,et al.  Construction and evaluation of a robust multifeature speech/music discriminator , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[16]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[17]  Vidhyasaharan Sethu,et al.  Investigation of spectral centroid features for cognitive load classification , 2011, Speech Commun..

[18]  S. Furui,et al.  Cepstral analysis technique for automatic speaker verification , 1981 .

[19]  Eliathamby Ambikairajah,et al.  Investigation of Spectral Centroid Magnitude and Frequency for Speaker Recognition , 2010, Odyssey.

[20]  Ibon Saratxaga,et al.  Evaluation of Speaker Verification Security and Detection of HMM-Based Synthetic Speech , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[21]  Haizhou Li,et al.  A study on spoofing attack in state-of-the-art speaker verification: the telephone speech case , 2012, Proceedings of The 2012 Asia Pacific Signal and Information Processing Association Annual Summit and Conference.

[22]  M. Picheny,et al.  Comparison of Parametric Representation for Monosyllabic Word Recognition in Continuously Spoken Sentences , 2017 .

[23]  Bin Ma,et al.  The reddots data collection for speaker recognition , 2015, INTERSPEECH.

[24]  Haizhou Li,et al.  Spoofing and countermeasures for speaker verification: A survey , 2015, Speech Commun..