Novel Variable Length Teager Energy Separation Based Instantaneous Frequency Features for Replay Detection

Replay attacks presents a great risk for Automatic Speaker Verification (ASV) system. In this paper, we propose a novel replay detector based on Variable length Teager Energy OperatorEnergy Separation Algorithm-Instantaneous Frequency Cosine Coefficients (VESA-IFCC) for the ASV spoof 2017 challenge. The key idea here is to exploit the contribution of IF in each subband energy via ESA to capture possible changes in spectral envelope (due to transmission and channel characteristics of replay device) of replayed speech. The IF is computed from narrowband components of speech signal, and DCT is applied in IF to get proposed feature set. We compare the performance of the proposed VESA-IFCC feature set with the features developed for detecting synthetic and voice converted speech. This includes the CQCC, CFCCIF and prosody-based features. On the development set, the proposed VESA-IFCC features when fused at score-level with a variant of CFCCIF and prosodybased features gave the least EER of 0.12 %. On the evaluation set, this combination gave an EER of 18.33 %. However, post-evaluation results of challenge indicate that VESA-IFCC features alone gave the relatively least EER of 14.06 % (i.e., relatively 16.11 % less compared to baseline CQCC) and hence, is a very useful countermeasure to detect replay attacks.

[1]  Kong-Aik Lee,et al.  The ASVspoof 2017 Challenge: Assessing the Limits of Replay Spoofing Attack Detection , 2017, INTERSPEECH.

[2]  Hemant A. Patil,et al.  Combining evidences from mel cepstral, cochlear filter cepstral and instantaneous frequency features for detection of natural vs. spoofed speech , 2015, INTERSPEECH.

[3]  Nicholas W. D. Evans,et al.  Constant Q cepstral coefficients: A spoofing countermeasure for automatic speaker verification , 2017, Comput. Speech Lang..

[4]  Haizhou Li,et al.  A study on replay attack and anti-spoofing for text-dependent speaker verification , 2014, Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific.

[5]  Petros Maragos,et al.  Energy separation in signal modulations with application to speech analysis , 1993, IEEE Trans. Signal Process..

[6]  Nicholas W. D. Evans,et al.  Re-assessing the threat of replay spoofing attacks against automatic speaker verification , 2014, 2014 International Conference of the Biometrics Special Interest Group (BIOSIG).

[7]  Karthika Vijayan,et al.  Significance of analytic phase of speech signals in speaker verification , 2016, Speech Commun..

[8]  M. Wagner,et al.  Vulnerability of speaker verification to voice mimicking , 2004, Proceedings of 2004 International Symposium on Intelligent Multimedia, Video and Speech Processing, 2004..

[9]  Rohan Kumar Das,et al.  Countermeasure to handle replay attacks in practical speaker verification systems , 2016, 2016 International Conference on Signal Processing and Communications (SPCOM).

[10]  Hemant A. Patil,et al.  Effectiveness of fundamental frequency (F0) and strength of excitation (SOE) for spoofed speech detection , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[11]  Nicholas W. D. Evans,et al.  A New Feature for Automatic Speaker Verification Anti-Spoofing: Constant Q Cepstral Coefficients , 2016, Odyssey.

[12]  Karthika Vijayan,et al.  Analysis of features from analytic representation of speech using MP-ABX measures , 2015, INTERSPEECH.

[13]  Tomi Kinnunen,et al.  Spoofing and countermeasures for automatic speaker verification , 2013, INTERSPEECH.

[14]  Jong-Ho Choi,et al.  Neural action potential detector using multi-resolution TEO , 2002 .

[15]  Hemant A. Patil,et al.  Cochlear Filter and Instantaneous Frequency Based Features for Spoofed Speech Detection , 2017, IEEE Journal of Selected Topics in Signal Processing.

[16]  Driss Matrouf,et al.  Artificial impostor voice transformation effects on false acceptance rates , 2007, INTERSPEECH.

[17]  Chng Eng Siong,et al.  Vulnerability of speaker verification systems against voice conversion spoofing attacks: The case of telephone speech , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[18]  Heiga Zen,et al.  Statistical Parametric Speech Synthesis , 2007, IEEE International Conference on Acoustics, Speech, and Signal Processing.

[19]  Keiichi Tokuda,et al.  A robust speaker verification system against imposture using an HMM-based speech synthesis system , 2001, INTERSPEECH.

[20]  Bin Ma,et al.  The reddots data collection for speaker recognition , 2015, INTERSPEECH.

[21]  Sébastien Marcel,et al.  On the vulnerability of speaker verification to realistic voice spoofing , 2015, 2015 IEEE 7th International Conference on Biometrics Theory, Applications and Systems (BTAS).

[22]  J. F. Kaiser,et al.  On a simple algorithm to calculate the 'energy' of a signal , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[23]  Eduardo Lleida,et al.  Detecting Replay Attacks from Far-Field Recordings on Speaker Verification Systems , 2011, BIOID.

[24]  Aleksandr Sizov,et al.  ASVspoof 2015: the first automatic speaker verification spoofing and countermeasures challenge , 2015, INTERSPEECH.

[25]  Goutam Saha,et al.  Generalization of spoofing countermeasures: A case study with ASVspoof 2015 and BTAS 2016 corpora , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[26]  A.E. Rosenberg,et al.  Automatic speaker verification: A review , 1976, Proceedings of the IEEE.

[27]  Bayya Yegnanarayana,et al.  Epoch Extraction From Speech Signals , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[28]  Keshab K. Parhi,et al.  Novel Variable length Teager Energy Based features for person recognition from their hum , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[29]  Sébastien Marcel,et al.  Cross-Database Evaluation of Audio-Based Spoofing Detection Systems , 2016, INTERSPEECH.

[30]  Kong-Aik Lee,et al.  RedDots replayed: A new replay spoofing attack corpus for text-dependent speaker verification research , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[31]  Yannis Stylianou,et al.  Voice Transformation: A survey , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[32]  Sébastien Marcel,et al.  Presentation Attack Detection Using Long-Term Spectral Statistics for Trustworthy Speaker Verification , 2016, 2016 International Conference of the Biometrics Special Interest Group (BIOSIG).

[33]  P. Maragos,et al.  Speech formant frequency and bandwidth tracking using multiband energy demodulation , 1996 .

[34]  Haizhou Li,et al.  Spoofing and countermeasures for speaker verification: A survey , 2015, Speech Commun..

[35]  Wei Shang,et al.  Score normalization in playback attack detection , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[36]  Mats Blomberg,et al.  Vulnerability in speaker verification - a study of technical impostor techniques , 1999, EUROSPEECH.

[37]  Petros Maragos,et al.  On separating amplitude from frequency modulations using energy operators , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[38]  Dat Tran,et al.  Testing Voice Mimicry with the YOHO Speaker Verification Corpus , 2005, KES.

[39]  Hemant A. Patil,et al.  A novel filtering based approach for epoch extraction , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[40]  Heiga Zen,et al.  Statistical Parametric Speech Synthesis , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[41]  Hemant A. Patil,et al.  On the development of variable length Teager energy operator (VTEO) , 2008, INTERSPEECH.

[42]  Junichi Yamagishi,et al.  ASVspoof 2021: Automatic Speaker Verification Spoofing and Countermeasures Challenge Evaluation Plan , 2021, ArXiv.