Multiple Phase Information Combination for Replay Attacks Detection

In recent years, the performance of Automatic Speaker Verification (ASV) systems has been improved significantly. However, they are still affected by different kind of spoofing attacks. In this paper, we propose a method that fused different phase features and amplitude features to detect replay attacks. We apply the mel-scale relative phase feature and source-filter vocal tract feature in phase domain for replay attacks detection. These two phase-based features are combined to get complementary information. In addition to these phase characteristics, constant Q cepstral coefficients (CQCCs) are used. The proposed methods are evaluated using the ASVspoof 2017 challenge database, and Gaussian mixture model was used as the back-end model. The proposed approach achieved 55.6% relative error reduction rate than the conventional magnitude-based feature.

[1]  Matthieu Hébert,et al.  Text-Dependent Speaker Recognition , 2008 .

[2]  Longbiao Wang,et al.  Robust distant speaker recognition based on position-dependent CMN by combining speaker-specific GMM with speaker-adapted HMM , 2007, Speech Commun..

[3]  Jon Barker,et al.  Source-filter separation of speech signal in the phase domain , 2015, INTERSPEECH.

[4]  Jakub Galka,et al.  Audio Replay Attack Detection Using High-Frequency Features , 2017, INTERSPEECH.

[5]  Haizhou Li,et al.  A study on replay attack and anti-spoofing for text-dependent speaker verification , 2014, Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific.

[6]  Tomi Kinnunen,et al.  Spoofing and countermeasures for automatic speaker verification , 2013, INTERSPEECH.

[7]  Longbiao Wang,et al.  Speaker Identification and Verification by Combining MFCC and Phase Information , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Kong-Aik Lee,et al.  Normalization of total variability matrix for i-vector/PLDA speaker verification , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9]  Georg Heigold,et al.  End-to-end text-dependent speaker verification , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[10]  Xuan Zhu,et al.  Feature Selection Based on CQCCs for Automatic Speaker Verification Spoofing , 2017, INTERSPEECH.

[11]  Themos Stafylakis,et al.  Development of CRIM system for the automatic speaker verification spoofing and countermeasures challenge 2015 , 2015, INTERSPEECH.

[12]  Longbiao Wang,et al.  Speaker identification by combining MFCC and phase information in noisy environments , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[13]  Longbiao Wang,et al.  Noise robust voice activity detection using joint phase and magnitude based feature enhancement , 2017, J. Ambient Intell. Humaniz. Comput..

[14]  Nicholas W. D. Evans,et al.  Constant Q cepstral coefficients: A spoofing countermeasure for automatic speaker verification , 2017, Comput. Speech Lang..

[15]  Artur Janicki Spoofing countermeasure based on analysis of linear prediction error , 2015, INTERSPEECH.

[16]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[17]  Longbiao Wang,et al.  Spoofing Speech Detection Using Modified Relative Phase Information , 2017, IEEE Journal of Selected Topics in Signal Processing.

[18]  María José Cano,et al.  Experimental Analysis of Features for Replay Attack Detection - Results on the ASVspoof 2017 Challenge , 2017, INTERSPEECH.

[19]  Haizhou Li,et al.  Spoofing and countermeasures for speaker verification: A survey , 2015, Speech Commun..

[20]  Longbiao Wang,et al.  Relative phase information for detecting human speech and spoofed speech , 2015, INTERSPEECH.

[21]  Douglas A. Reynolds,et al.  Speaker identification and verification using Gaussian mixture speaker models , 1995, Speech Commun..

[22]  Rajesh M. Hegde,et al.  Significance of the Modified Group Delay Feature in Speech Recognition , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[23]  Judith C. Brown Calculation of a constant Q spectral transform , 1991 .

[24]  Nicholas W. D. Evans,et al.  A New Feature for Automatic Speaker Verification Anti-Spoofing: Constant Q Cepstral Coefficients , 2016, Odyssey.

[25]  Sree Hari Krishnan Parthasarathi,et al.  Robustness of phase based features for speaker recognition , 2009, INTERSPEECH.

[26]  Gang Wei,et al.  Channel pattern noise based playback attack detection algorithm for speaker recognition , 2011, 2011 International Conference on Machine Learning and Cybernetics.

[27]  Rajesh M. Hegde,et al.  Application of the modified group delay function to speaker identification and discrimination , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.