One-class learning towards generalized voice spoofing detection

Human voices can be used to authenticate the identity of the speaker, but the automatic speaker verification (ASV) systems are vulnerable to voice spoofing attacks, such as impersonation, replay, text-to-speech, and voice conversion. Recently, researchers developed anti-spoofing techniques to improve the reliability of ASV systems against spoofing attacks. However, most methods encounter difficulties in detecting unknown attacks in practical use, which often have different statistical distributions from known attacks. In this work, we propose an anti-spoofing system to detect unknown logical access attacks (i.e., synthetic speech) using one-class learning. The key idea is to compact the genuine speech representation and inject an angular margin to separate the spoofing attacks in the embedding space. Our system achieves an equal error rate of 2.19% on the evaluation set of ASVspoof 2019 Challenge, outperforming all existing single systems.

[1]  Simon S. Woo,et al.  OC-FakeDect: Classifying Deepfakes Using One-class Variational Autoencoder , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[2]  Tomi Kinnunen,et al.  ASVspoof 2019: Future Horizons in Spoofed and Fake Audio Detection , 2019, INTERSPEECH.

[3]  Hemant A. Patil,et al.  Combining evidences from mel cepstral, cochlear filter cepstral and instantaneous frequency features for detection of natural vs. spoofed speech , 2015, INTERSPEECH.

[4]  Junichi Yamagishi,et al.  Predictions of Subjective Ratings and Spoofing Assessments of Voice Conversion Challenge 2020 Submissions , 2020, Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020.

[5]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Jon Sánchez,et al.  Toward a Universal Synthetic Speech Spoofing Detection Using Phase Information , 2015, IEEE Transactions on Information Forensics and Security.

[7]  Nicholas W. D. Evans,et al.  A New Feature for Automatic Speaker Verification Anti-Spoofing: Constant Q Cepstral Coefficients , 2016, Odyssey.

[8]  Eduardo Lleida,et al.  Spoofing detection with DNN and one-class SVM for the ASVspoof 2015 challenge , 2015, INTERSPEECH.

[9]  Bob L. Sturm,et al.  Ensemble Models for Spoofing Detection in Automatic Speaker Verification , 2019, INTERSPEECH.

[10]  Tomi Kinnunen,et al.  A comparison of features for synthetic speech detection , 2015, INTERSPEECH.

[11]  Kong-Aik Lee,et al.  t-DCF: a Detection Cost Function for the Tandem Assessment of Spoofing Countermeasures and Automatic Speaker Verification , 2018, Odyssey.

[12]  Lauri Juvela,et al.  ASVspoof 2019: A large-scale public database of synthesized, converted and replayed speech , 2019, Comput. Speech Lang..

[13]  Galina Lavrentyeva,et al.  STC Antispoofing Systems for the ASVspoof2019 Challenge , 2019, INTERSPEECH.

[14]  Ganesh Sivaraman,et al.  Generalization of Audio Deepfake Detection , 2020, Odyssey.

[15]  M. Grgic,et al.  A survey of biometric recognition methods , 2004, Proceedings. Elmar-2004. 46th International Symposium on Electronics in Marine.

[16]  Haizhou Li,et al.  Spoofing and countermeasures for speaker verification: A survey , 2015, Speech Commun..

[17]  Jian Cheng,et al.  Additive Margin Softmax for Face Verification , 2018, IEEE Signal Processing Letters.

[18]  Longbiao Wang,et al.  Relative phase information for detecting human speech and spoofed speech , 2015, INTERSPEECH.

[19]  Tiago H. Falk,et al.  Generalized end-to-end detection of spoofing attacks to automatic speaker recognizers , 2020, Comput. Speech Lang..

[20]  Ángel M. Gómez,et al.  A Light Convolutional GRU-RNN Deep Feature Extractor for ASV Spoofing Detection , 2019, INTERSPEECH.

[21]  Aleksandr Sizov,et al.  ASVspoof 2015: the first automatic speaker verification spoofing and countermeasures challenge , 2015, INTERSPEECH.

[22]  Shehroz S. Khan,et al.  A Survey of Recent Trends in One Class Classification , 2009, AICS.

[23]  Kong-Aik Lee,et al.  The ASVspoof 2017 Challenge: Assessing the Limits of Replay Spoofing Attack Detection , 2017, INTERSPEECH.

[24]  Iacopo Masi,et al.  Two-branch Recurrent Network for Isolating Deepfakes in Videos , 2020, ECCV.

[25]  Hemlata Tak,et al.  Spoofing Attack Detection using the Non-linear Fusion of Sub-band Classifiers , 2020, INTERSPEECH.

[26]  Nandakumar Paramparambath,et al.  Audio Spoofing Verification using Deep Convolutional Neural Networks by Transfer Learning , 2020, ArXiv.

[27]  Haizhou Li,et al.  Spoofing detection from a feature representation perspective , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[28]  Haizhou Li,et al.  Light Convolutional Neural Network with Feature Genuinization for Detection of Synthetic Speech Attacks , 2020, INTERSPEECH.

[29]  Haizhou Li,et al.  Advances in anti-spoofing: from the perspective of ASVspoof challenges , 2020, APSIPA Transactions on Signal and Information Processing.

[30]  Vishal M. Patel,et al.  Anomaly Detection-Based Unknown Face Presentation Attack Detection , 2020, 2020 IEEE International Joint Conference on Biometrics (IJCB).

[31]  John H. L. Hansen,et al.  An Investigation of Deep-Learning Frameworks for Speaker Verification Antispoofing , 2017, IEEE Journal of Selected Topics in Signal Processing.