Replay Attack Detection Using Generalized Cross-Correlation of Stereo Signal

In this paper, we propose a replay attack detection method using the generalized cross-correlation (GCC) of a stereo signal for automatic speaker verification. In particular, this method focuses on a specific replay attack characteristics when speech is not active. In a genuine speaker case, when speech is not active, the maximum value of GCC is low since surrounding noise arrives from any direction. In contrast, in a replay attack case, even when the played speech is not active, the maximum value of GCC is high since recorded noise or electromagnetic noise is played by a loudspeaker for replay attack. Based on this assumption, two approaches of replay attack detection are introduced. One is to use the minimum value of GCC in short pauses. The other one is to use the average value of GCC in silent periods before the start point and after the end point of a target utterance. In experiments, it is confirmed that the proposed methods achieve low error rates without environmental restrictions.

[1]  Hemant A. Patil,et al.  Combining evidences from mel cepstral, cochlear filter cepstral and instantaneous frequency features for detection of natural vs. spoofed speech , 2015, INTERSPEECH.

[2]  Jie Yang,et al.  Snooping Keystrokes with mm-level Audio Ranging on a Single Phone , 2015, MobiCom.

[3]  Zhifeng Xie,et al.  ResNet and Model Fusion for Automatic Spoofing Detection , 2017, INTERSPEECH.

[4]  Jie Yang,et al.  VoiceLive: A Phoneme Localization based Liveness Detection for Voice Authentication on Smartphones , 2016, CCS.

[5]  Tomi Kinnunen,et al.  I-vectors meet imitators: on vulnerability of speaker verification systems against voice mimicry , 2013, INTERSPEECH.

[6]  Li-Rong Dai,et al.  Speaker verification against synthetic speech , 2010, 2010 7th International Symposium on Chinese Spoken Language Processing.

[7]  Aleksandr Sizov,et al.  ASVspoof: The Automatic Speaker Verification Spoofing and Countermeasures Challenge , 2017, IEEE Journal of Selected Topics in Signal Processing.

[8]  Xuan Zhu,et al.  Feature Selection Based on CQCCs for Automatic Speaker Verification Spoofing , 2017, INTERSPEECH.

[9]  Eduardo Lleida,et al.  Detecting Replay Attacks from Far-Field Recordings on Speaker Verification Systems , 2011, BIOID.

[10]  Aleksandr Sizov,et al.  ASVspoof 2015: the first automatic speaker verification spoofing and countermeasures challenge , 2015, INTERSPEECH.

[11]  Chng Eng Siong,et al.  Vulnerability of speaker verification systems against voice conversion spoofing attacks: The case of telephone speech , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[12]  Galina Lavrentyeva,et al.  Audio Replay Attack Detection with Deep Learning Frameworks , 2017, INTERSPEECH.

[13]  Junichi Yamagishi,et al.  Voice liveness detection algorithms based on pop noise caused by human breath for automatic speaker verification , 2015, INTERSPEECH.

[14]  J C Brown Computer identification of musical instruments using pattern recognition with cepstral coefficients as features. , 1999, The Journal of the Acoustical Society of America.

[15]  A. Jongman Acoustics of American English Speech: A Dynamic Approach , 1995 .

[16]  G. Carter,et al.  The generalized correlation method for estimation of time delay , 1976 .