A study on replay attack and anti-spoofing for text-dependent speaker verification

Replay, which is to playback a pre-recorded speech sample, presents a genuine risk to automatic speaker verification technology. In this study, we evaluate the vulnerability of text-dependent speaker verification systems under the replay attack using a standard benchmarking database, and also propose an anti-spoofing technique to safeguard the speaker verification systems. The key idea of the spoofing detection technique is to decide whether the presented sample is matched to any previous stored speech samples based a similarity score. The experiments conducted on the RSR2015 database showed that the equal error rate (EER) and false acceptance rate (FAR) increased from both 2.92 % to 25.56 % and 78.36 % respectively as a result of the replay attack. It confirmed the vulnerability of speaker verification to replay attacks. On the other hand, our proposed spoofing countermeasure was able to reduce the FARs from 78.36 % and 73.14 % to 0.06 % and 0.0 % for male and female systems, respectively, in the face of replay spoofing. The experiments confirmed the effectiveness of the proposed anti-spoofing technique.

[1]  Haizhou Li,et al.  An overview of text-independent speaker recognition: From features to supervectors , 2010, Speech Commun..

[2]  Avery Wang,et al.  An Industrial Strength Audio Search Algorithm , 2003, ISMIR.

[3]  Eduardo Lleida,et al.  Preventing replay attacks on speaker verification systems , 2011, 2011 Carnahan Conference on Security Technology.

[4]  Tomi Kinnunen,et al.  I-vectors meet imitators: on vulnerability of speaker verification systems against voice mimicry , 2013, INTERSPEECH.

[5]  Ibon Saratxaga,et al.  Evaluation of Speaker Verification Security and Detection of HMM-Based Synthetic Speech , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  Haizhou Li,et al.  A study on spoofing attack in state-of-the-art speaker verification: the telephone speech case , 2012, Proceedings of The 2012 Asia Pacific Signal and Information Processing Association Annual Summit and Conference.

[7]  Bin Ma,et al.  The RSR2015: Database for Text-Dependent Speaker Verification using Multiple Pass-Phrases , 2012, Interspeech 2012.

[8]  Zhizheng Wu,et al.  Voice conversion and spoofing attack on speaker verification systems , 2013, 2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference.

[9]  Haizhou Li,et al.  Vulnerability evaluation of speaker verification under voice conversion spoofing: the effect of text constraints , 2013, INTERSPEECH.

[10]  Driss Matrouf,et al.  Artificial impostor voice transformation effects on false acceptance rates , 2007, INTERSPEECH.

[11]  Nicholas W. D. Evans,et al.  Spoofing countermeasures to protect automatic speaker verification from voice conversion , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[12]  Gang Wei,et al.  Channel pattern noise based playback attack detection algorithm for speaker recognition , 2011, 2011 International Conference on Machine Learning and Cybernetics.

[13]  Wei Shang,et al.  Score normalization in playback attack detection , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[14]  Mats Blomberg,et al.  Vulnerability in speaker verification - a study of technical impostor techniques , 1999, EUROSPEECH.

[15]  Dat Tran,et al.  Testing Voice Mimicry with the YOHO Speaker Verification Corpus , 2005, KES.

[16]  Nicholas W. D. Evans,et al.  Re-assessing the threat of replay spoofing attacks against automatic speaker verification , 2014, 2014 International Conference of the Biometrics Special Interest Group (BIOSIG).

[17]  Chng Eng Siong,et al.  Vulnerability of speaker verification systems against voice conversion spoofing attacks: The case of telephone speech , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[18]  Keiichi Tokuda,et al.  A robust speaker verification system against imposture using an HMM-based speech synthesis system , 2001, INTERSPEECH.

[19]  Chin-Hui Lee,et al.  Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..

[20]  M. Wagner,et al.  Vulnerability of speaker verification to voice mimicking , 2004, Proceedings of 2004 International Symposium on Intelligent Multimedia, Video and Speech Processing, 2004..

[21]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[22]  Eduardo Lleida,et al.  Detecting Replay Attacks from Far-Field Recordings on Speaker Verification Systems , 2011, BIOID.

[23]  Tomi Kinnunen,et al.  Spoofing and countermeasures for automatic speaker verification , 2013, INTERSPEECH.

[24]  Hagai Aronowitz,et al.  Voice transformation-based spoofing of text-dependent speaker verification systems , 2013, INTERSPEECH.

[25]  Samy Bengio,et al.  Can a Professional Imitator Fool a GMM-Based Speaker Verification System? , 2005 .