Combining Phase-based Features for Replay Spoof Detection System

Automatic Speaker Verification (ASV) systems are developed to verify the claimed identity of a speaker based on speech samples. The technological advances have given pathways to practical ASV systems that showcase the threat towards spoofing attacks. Replay is one of the spoofing attacks where the ASV systems are fooled with pre-recorded speech samples of a target speaker. In this context, both magnitude-based and phase-based spectral features get affected by the quality of intermediate devices and their environments. There have been only a few studies reported to detect the replay attacks based on the phase features. In this paper, we explore the relative significance of various phase-based features for detecting replay attacks. The magnitude-based features are chosen to perform score-level fusion with phase-based features to capture the possible complementary information. Among various possible combinations of magnitude and phase-based features, we obtain 12.25 % as the best Equal Error Rate (EER) which is less than that obtained with individual feature set, while the score-level fusion of phase-based features gave an EER of 13.14 % on the evaluation set of ASVspoof 2017 version 1 database.

[1]  Ibon Saratxaga,et al.  Perceptual Importance of the Phase Related Information in Speech , 2012, INTERSPEECH.

[2]  Hema A. Murthy,et al.  The modified group delay function and its application to phoneme recognition , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[3]  Xuan Zhu,et al.  Feature Selection Based on CQCCs for Automatic Speaker Verification Spoofing , 2017, INTERSPEECH.

[4]  Hermann Ney,et al.  Computing Mel-frequency cepstral coefficients on the power spectrum , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[5]  Cemal Hanilçi Speaker verification anti-spoofing using linear prediction residual phase features , 2017, 2017 25th European Signal Processing Conference (EUSIPCO).

[6]  Rohan Kumar Das,et al.  Countermeasure to handle replay attacks in practical speaker verification systems , 2016, 2016 International Conference on Signal Processing and Communications (SPCOM).

[7]  Jon Sánchez,et al.  Toward a Universal Synthetic Speech Spoofing Detection Using Phase Information , 2015, IEEE Transactions on Information Forensics and Security.

[8]  Haizhou Li,et al.  Spoofing and countermeasures for speaker verification: A survey , 2015, Speech Commun..

[9]  Haizhou Li,et al.  Instantaneous Phase and Excitation Source Features for Detection of Replay Attacks , 2018, 2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC).

[10]  Daniel Garcia-Romero,et al.  Linear versus mel frequency cepstral coefficients for speaker recognition , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.

[11]  Kuldip K. Paliwal,et al.  Product of power spectrum and group delay function for speech recognition , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12]  Haizhou Li,et al.  A study on replay attack and anti-spoofing for text-dependent speaker verification , 2014, Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific.

[13]  Bayya Yegnanarayana,et al.  Combining evidence from residual phase and MFCC features for speaker recognition , 2006, IEEE Signal Processing Letters.

[14]  B. Yegnanarayana,et al.  Epoch extraction from linear prediction residual for identification of closed glottis interval , 1979 .

[15]  Daniel Erro,et al.  A uniform phase representation for the harmonic model in speech synthesis applications , 2014, EURASIP J. Audio Speech Music. Process..

[16]  Yannis Stylianou,et al.  Advances in phase-aware signal processing in speech communication , 2016, Speech Commun..

[17]  Galina Lavrentyeva,et al.  Audio Replay Attack Detection with Deep Learning Frameworks , 2017, INTERSPEECH.

[18]  María José Cano,et al.  Experimental Analysis of Features for Replay Attack Detection - Results on the ASVspoof 2017 Challenge , 2017, INTERSPEECH.

[19]  Eduardo Lleida,et al.  Preventing replay attacks on speaker verification systems , 2011, 2011 Carnahan Conference on Security Technology.

[20]  Alvin F. Martin,et al.  The DET curve in assessment of detection task performance , 1997, EUROSPEECH.

[21]  I. Saratxaga,et al.  Simple representation of signal phase for harmonic speech models , 2009 .

[22]  Sébastien Marcel,et al.  Presentation Attack Detection Using Long-Term Spectral Statistics for Trustworthy Speaker Verification , 2016, 2016 International Conference of the Biometrics Special Interest Group (BIOSIG).

[23]  Ming Li,et al.  Countermeasures for Automatic Speaker Verification Replay Spoofing Attack : On Data Augmentation, Feature Representation, Classification and Fusion , 2017, INTERSPEECH.

[24]  Nicholas W. D. Evans,et al.  Articulation Rate Filtering of CQCC Features for Automatic Speaker Verification , 2016, INTERSPEECH.

[25]  Haizhou Li,et al.  Synthetic speech detection using temporal modulation feature , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[26]  Nicholas W. D. Evans,et al.  Re-assessing the threat of replay spoofing attacks against automatic speaker verification , 2014, 2014 International Conference of the Biometrics Special Interest Group (BIOSIG).

[27]  Suryakanth V. Gangashetty,et al.  SFF Anti-Spoofer: IIIT-H Submission for Automatic Speaker Verification Spoofing and Countermeasures Challenge 2017 , 2017, INTERSPEECH.

[28]  Rajesh M. Hegde,et al.  Significance of the Modified Group Delay Feature in Speech Recognition , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[29]  Parav Nagarsheth,et al.  Replay Attack Detection Using DNN for Channel Discrimination , 2017, INTERSPEECH.

[30]  Jakub Galka,et al.  Audio Replay Attack Detection Using High-Frequency Features , 2017, INTERSPEECH.

[31]  Jon Sánchez,et al.  Use of the Harmonic Phase in Speaker Recognition , 2011, INTERSPEECH.

[32]  Kong-Aik Lee,et al.  The ASVspoof 2017 Challenge: Assessing the Limits of Replay Spoofing Attack Detection , 2017, INTERSPEECH.

[33]  Kuldip K. Paliwal,et al.  Usefulness of phase spectrum in human speech perception , 2003, INTERSPEECH.

[34]  S. R. Mahadeva Prasanna,et al.  Spoof Detection Using Source, Instantaneous Frequency and Cepstral Features , 2017, INTERSPEECH.

[35]  Madhu R. Kamble,et al.  Novel Variable Length Teager Energy Separation Based Instantaneous Frequency Features for Replay Detection , 2017, INTERSPEECH.