Compressed high dimensional features for speaker spoofing detection

The vulnerability in Automatic Speaker Verification (ASV) systems to spoofing attacks such as speech synthesis (SS) and voice conversion (VC) has been recently proved. High- dimensional magnitude and phase based features possess outstanding spoofing detection performance but are not compatible with the Gaussian Mixture Model (GMM) classifiers which are commonly deployed in speaker recognition systems. In this paper, a Compressed Sensing (CS) framework is initially combined with high-dimensional (HD) features and a derived CS-HD based feature is proposed. A standalone spoofing detector assembled with the GMM classifier is evaluated on the ASVspoof 2015 database. Two ASV systems integrated with the spoofing detector are also tested. For the separate detector, an equal error rate (EER) of 0.01% and 5.35% are reached on the evaluation set for known attack and unknown attack, respectively. While for the ASV systems, the best EERs of 0.02% and 5.26% are achieved. The proposed CS-HD feature can obtain similar results with lower dimension than other systems. This suggests that the verification system can be made more computationally efficient.

[1]  Zhizheng Wu,et al.  Automatic Speaker Verification Spoofing and Countermeasures Challenge (ASVspoof 2015) Database , 2014 .

[2]  David L Donoho,et al.  Compressed sensing , 2006, IEEE Transactions on Information Theory.

[3]  Hemant A. Patil,et al.  Combining evidences from mel cepstral, cochlear filter cepstral and instantaneous frequency features for detection of natural vs. spoofed speech , 2015, INTERSPEECH.

[4]  M. Elad,et al.  $rm K$-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation , 2006, IEEE Transactions on Signal Processing.

[5]  R Togneri,et al.  An Overview of Speaker Identification: Accuracy and Robustness Issues , 2011, IEEE Circuits and Systems Magazine.

[6]  Nicholas W. D. Evans,et al.  Constant Q cepstral coefficients: A spoofing countermeasure for automatic speaker verification , 2017, Comput. Speech Lang..

[7]  Joel A. Tropp,et al.  Signal Recovery From Random Measurements Via Orthogonal Matching Pursuit , 2007, IEEE Transactions on Information Theory.

[8]  Bo Chen,et al.  Robust deep feature for spoofing detection - the SJTU system for ASVspoof 2015 challenge , 2015, INTERSPEECH.

[9]  Haizhou Li,et al.  Spoofing detection from a feature representation perspective , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[10]  E. Candès The restricted isometry property and its implications for compressed sensing , 2008 .

[11]  Haizhou Li,et al.  Spoofing and countermeasures for speaker verification: A survey , 2015, Speech Commun..

[12]  E. Ambikairajah,et al.  Speaker verification using sparse representation classification , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[13]  Tomoki Toda,et al.  Anti-Spoofing for Text-Independent Speaker Verification: An Initial Database, Comparison of Countermeasures, and Human Performance , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[14]  A. Bruckstein,et al.  K-SVD : An Algorithm for Designing of Overcomplete Dictionaries for Sparse Representation , 2005 .

[15]  Haizhou Li,et al.  Spoofing speech detection using high dimensional magnitude and phase features: the NTU approach for ASVspoof 2015 challenge , 2015, INTERSPEECH.

[16]  E.J. Candes,et al.  An Introduction To Compressive Sampling , 2008, IEEE Signal Processing Magazine.

[17]  Mohammed Bennamoun,et al.  Sparse Representation for Speaker Identification , 2010, 2010 20th International Conference on Pattern Recognition.

[18]  Aleksandr Sizov,et al.  ASVspoof: The Automatic Speaker Verification Spoofing and Countermeasures Challenge , 2017, IEEE Journal of Selected Topics in Signal Processing.