ASVspoof 2021: accelerating progress in spoofed and deepfake speech detection

ASVspoof 2021 is the fourth edition in the series of biannual challenges which aim to promote the study of spoofing and the design of countermeasures to protect automatic speaker verification systems from manipulation. In addition to a continued focus upon logical and physical access tasks in which there are a number of advances compared to previous editions, ASVspoof 2021 introduces a new task involving deepfake speech detection. This paper describes all three tasks, the new databases for each of them, the evaluation metrics, four challenge baselines, the evaluation platform and a summary of challenge results. Despite the introduction of channel and compression variability which compound the difficulty, results for the logical access and deepfake tasks are close to those from previous ASVspoof editions. Results for the physical access task show the difficulty in detecting attacks in real, variable physical spaces. With ASVspoof 2021 being the first edition for which participants were not provided with any matched training or development data and with this reflecting real conditions in which the nature of spoofed and deepfake speech can never be predicated with confidence, the results are extremely encouraging and demonstrate the substantial progress made in the field in recent years.

[1]  Niko Brümmer,et al.  Out of a hundred trials, how many errors does your speaker verifier make? , 2021, Interspeech.

[2]  Aleksandr Sizov,et al.  ASVspoof 2015: the first automatic speaker verification spoofing and countermeasures challenge , 2015, INTERSPEECH.

[3]  James H. Elder,et al.  Probabilistic Linear Discriminant Analysis for Inferences About Identity , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[4]  Samy Bengio,et al.  A statistical significance test for person authentication , 2004, Odyssey.

[5]  Hye-jin Shim,et al.  Improved RawNet with Feature Map Scaling for Text-Independent Speaker Verification Using Raw Waveforms , 2020, INTERSPEECH.

[6]  Tomi Kinnunen,et al.  ASVtorch toolkit: Speaker verification with deep neural networks , 2021, SoftwareX.

[7]  Hemlata Tak,et al.  End-to-end anti-spoofing with RawNet2 , 2020 .

[8]  Tomi Kinnunen,et al.  ASVspoof 2019: Spoofing Countermeasures for the Detection of Synthesized, Converted and Replayed Speech , 2021, IEEE Transactions on Biometrics, Behavior, and Identity Science.

[9]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .

[10]  Kong-Aik Lee,et al.  ASVspoof 2017 Version 2.0: meta-data analysis and baseline enhancements , 2018, Odyssey.

[11]  Douglas A. Reynolds,et al.  Tandem Assessment of Spoofing Countermeasures and Automatic Speaker Verification: Fundamentals , 2020, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[12]  Sanjeev Khudanpur,et al.  X-Vectors: Robust DNN Embeddings for Speaker Recognition , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[13]  Tomi Kinnunen,et al.  ASVspoof 2019: Future Horizons in Spoofed and Fake Audio Detection , 2019, INTERSPEECH.

[14]  Mickael Rouvier,et al.  On Robustness of Unsupervised Domain Adaptation for Speaker Recognition , 2019, INTERSPEECH.

[15]  Nicholas W. D. Evans,et al.  Constant Q cepstral coefficients: A spoofing countermeasure for automatic speaker verification , 2017, Comput. Speech Lang..

[16]  Aleksandr Sizov,et al.  ASVspoof: The Automatic Speaker Verification Spoofing and Countermeasures Challenge , 2017, IEEE Journal of Selected Topics in Signal Processing.

[17]  Joon Son Chung,et al.  VoxCeleb2: Deep Speaker Recognition , 2018, INTERSPEECH.

[18]  Galina Lavrentyeva,et al.  STC Antispoofing Systems for the ASVspoof2019 Challenge , 2019, INTERSPEECH.

[19]  Kong-Aik Lee,et al.  The ASVspoof 2017 Challenge: Assessing the Limits of Replay Spoofing Attack Detection , 2017, INTERSPEECH.

[20]  Tomi Kinnunen,et al.  A comparison of features for synthetic speech detection , 2015, INTERSPEECH.

[21]  Sébastien Le Maguer,et al.  ASVspoof 2019: A large-scale public database of synthesized, converted and replayed speech , 2019, Comput. Speech Lang..

[22]  Kong-Aik Lee,et al.  t-DCF: a Detection Cost Function for the Tandem Assessment of Spoofing Countermeasures and Automatic Speaker Verification , 2018, Odyssey.

[23]  Xin Wang,et al.  A Comparative Study on Recent Neural Spoofing Countermeasures for Synthetic Speech Detection , 2021, Interspeech.

[24]  Joon Son Chung,et al.  VoxCeleb: A Large-Scale Speaker Identification Dataset , 2017, INTERSPEECH.