Features and classifiers for replay spoofing attack detection

Automatic speaker verification (ASV) systems are known to be highly vulnerable against spoofing attacks. Various successful countermeasures have recently been proposed to detect spoofing attacks originating from speech synthesis (SS) and voice conversion (VC). However, detecting replay attacks, the most easily implementable spoofing attacks against ASV systems, has gained less attention. Thus, in this paper we present an experimental comparison of various feature extraction techniques and classifiers for replay attack detection. In total, six magnitude spectrum and three phase spectrum based features are used for feature extraction. For classification in turn, four different techniques are utilized. Experiments are conducted on recently released ASVspoof 2017 replay attack detection challenge. Experimental results reveals that magnitude spectrum features considerably outperform phase based features independent of the classifier. Comparative results using four different classifiers indicate that i-vector cosine scoring yields lower equal error rates (EERs) than other methods.

[1]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[2]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[3]  Artur Janicki,et al.  An assessment of automatic speaker verification vulnerabilities to replay spoofing attacks , 2016, Secur. Commun. Networks.

[4]  Sébastien Marcel,et al.  On the vulnerability of speaker verification to realistic voice spoofing , 2015, 2015 IEEE 7th International Conference on Biometrics Theory, Applications and Systems (BTAS).

[5]  Haizhou Li,et al.  Detecting Converted Speech and Natural Speech for anti-Spoofing Attack in Speaker Recognition , 2012, INTERSPEECH.

[6]  Aleksandr Sizov,et al.  Classifiers for synthetic speech detection: a comparison , 2015, INTERSPEECH.

[7]  Haizhou Li,et al.  An overview of text-independent speaker recognition: From features to supervectors , 2010, Speech Commun..

[8]  Nicholas W. D. Evans,et al.  A new speaker verification spoofing countermeasure based on local binary patterns , 2013, INTERSPEECH.

[9]  Aleksandr Sizov,et al.  ASVspoof: The Automatic Speaker Verification Spoofing and Countermeasures Challenge , 2017, IEEE Journal of Selected Topics in Signal Processing.

[10]  Haizhou Li,et al.  A study on spoofing attack in state-of-the-art speaker verification: the telephone speech case , 2012, Proceedings of The 2012 Asia Pacific Signal and Information Processing Association Annual Summit and Conference.

[11]  Tomi Kinnunen,et al.  A comparison of features for synthetic speech detection , 2015, INTERSPEECH.

[12]  Heiga Zen,et al.  Speech Synthesis Based on Hidden Markov Models , 2013, Proceedings of the IEEE.

[13]  Florin Curelaru,et al.  Front-End Factor Analysis For Speaker Verification , 2018, 2018 International Conference on Communications (COMM).

[14]  Eric Moulines,et al.  Continuous probabilistic transform for voice conversion , 1998, IEEE Trans. Speech Audio Process..

[15]  Goutam Saha,et al.  Improved Closed Set Text-Independent Speaker Identification by Combining MFCC with Evidence from Flipped Filter Banks , 2008 .

[16]  Judith C. Brown Calculation of a constant Q spectral transform , 1991 .

[17]  Nicholas W. D. Evans,et al.  A New Feature for Automatic Speaker Verification Anti-Spoofing: Constant Q Cepstral Coefficients , 2016, Odyssey.

[18]  Matti Pietikäinen,et al.  A comparative study of texture measures with classification based on featured distributions , 1996, Pattern Recognit..

[19]  Rafal Samborski,et al.  Playback attack detection for text-dependent speaker verification over telephone channels , 2015, Speech Commun..

[20]  Haizhou Li,et al.  Spoofing and countermeasures for speaker verification: A survey , 2015, Speech Commun..

[21]  Nicholas W. D. Evans,et al.  Constant Q cepstral coefficients: A spoofing countermeasure for automatic speaker verification , 2017, Comput. Speech Lang..

[22]  Tomi Kinnunen,et al.  Automatic versus human speaker verification: The case of voice mimicry , 2015, Speech Commun..

[23]  Andreas Stolcke,et al.  Within-class covariance normalization for SVM-based speaker recognition , 2006, INTERSPEECH.

[24]  William M. Campbell,et al.  Support vector machines for speaker and language recognition , 2006, Comput. Speech Lang..

[25]  Jon Sánchez,et al.  Toward a Universal Synthetic Speech Spoofing Detection Using Phase Information , 2015, IEEE Transactions on Information Forensics and Security.

[26]  Eduardo Lleida,et al.  Preventing replay attacks on speaker verification systems , 2011, 2011 Carnahan Conference on Security Technology.

[27]  Eduardo Lleida,et al.  Detecting Replay Attacks from Far-Field Recordings on Speaker Verification Systems , 2011, BIOID.

[28]  Daniel Garcia-Romero,et al.  Analysis of i-vector Length Normalization in Speaker Recognition Systems , 2011, INTERSPEECH.

[29]  Sharath Pankanti,et al.  Biometrics: a tool for information security , 2006, IEEE Transactions on Information Forensics and Security.

[30]  Junichi Yamagishi,et al.  ASVspoof 2021: Automatic Speaker Verification Spoofing and Countermeasures Challenge Evaluation Plan , 2021, ArXiv.