IIIT-H Spoofing Countermeasures for Automatic Speaker Verification Spoofing and Countermeasures Challenge 2019

The ASVspoof 2019 challenge focuses on countermeasures for all major spoofing attacks, namely speech synthesis (SS), voice conversion (VC), and replay spoofing attacks. This paper describes the IIIT-H spoofing countermeasures developed for ASVspoof 2019 challenge. In this study, three instantaneous cepstral features namely, single frequency cepstral coefficients, zero time windowing cepstral coefficients, and instantaneous frequency cepstral coefficients are used as front-end features. A Gaussian mixture model is used as back-end classifier. The experimental results on ASVspoof 2019 dataset reveal that the proposed instantaneous features are efficient in detecting VC and SS based attacks. In detecting replay attacks, proposed features are comparable with baseline systems. Further analysis is carried out using metadata to assess the impact of proposed countermeasures on different synthetic speech generating algorithm/replay configurations.

[1]  Tomi Kinnunen,et al.  ASVspoof 2019: Future Horizons in Spoofed and Fake Audio Detection , 2019, INTERSPEECH.

[2]  Suryakanth V. Gangashetty,et al.  Detection of Replay Attacks Using Single Frequency Filtering Cepstral Coefficients , 2017, INTERSPEECH.

[3]  Nicholas W. D. Evans,et al.  Constant Q cepstral coefficients: A spoofing countermeasure for automatic speaker verification , 2017, Comput. Speech Lang..

[4]  Kong-Aik Lee,et al.  RedDots replayed: A new replay spoofing attack corpus for text-dependent speaker verification research , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5]  Karthika Vijayan,et al.  Significance of analytic phase of speech signals in speaker verification , 2016, Speech Commun..

[6]  Tomi Kinnunen,et al.  A comparison of features for synthetic speech detection , 2015, INTERSPEECH.

[7]  Bayya Yegnanarayana,et al.  Acoustic segmentation of speech using zero time liftering (ZTL) , 2013, INTERSPEECH.

[8]  Bayya Yegnanarayana,et al.  Single Frequency Filtering Approach for Discriminating Speech and Nonspeech , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[9]  Jon Sánchez,et al.  Synthetic speech detection using phase information , 2016, Speech Commun..

[10]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[11]  Aleksandr Sizov,et al.  ASVspoof: The Automatic Speaker Verification Spoofing and Countermeasures Challenge , 2017, IEEE Journal of Selected Topics in Signal Processing.

[12]  Bayya Yegnanarayana,et al.  Analysis and Detection of Phonation Modes in Singing Voice using Excitation Source Features and Single Frequency Filtering Cepstral Coefficients (SFFCC) , 2018, INTERSPEECH.

[13]  Bayya Yegnanarayana,et al.  Spectro-temporal analysis of speech signals using zero-time windowing and group delay function , 2013, Speech Commun..

[14]  Haizhou Li,et al.  On the Importance of Analytic Phase of Speech Signals in Spoken Language Recognition , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15]  Niko Brümmer,et al.  The BOSARIS Toolkit: Theory, Algorithms and Code for Surviving the New DCF , 2013, ArXiv.

[16]  Goutam Saha,et al.  Overview of BTAS 2016 speaker anti-spoofing competition , 2016, 2016 IEEE 8th International Conference on Biometrics Theory, Applications and Systems (BTAS).

[17]  S. R. Mahadeva Prasanna,et al.  Spoof Detection Using Source, Instantaneous Frequency and Cepstral Features , 2017, INTERSPEECH.

[18]  Suryakanth V. Gangashetty,et al.  SFF Anti-Spoofer: IIIT-H Submission for Automatic Speaker Verification Spoofing and Countermeasures Challenge 2017 , 2017, INTERSPEECH.

[19]  Nicholas Evans Spoofing and countermeasures for speaker verification: a need for standard corpora, protocols and metrics , 2013 .

[20]  Haizhou Li,et al.  Instantaneous Phase and Excitation Source Features for Detection of Replay Attacks , 2018, 2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC).

[21]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[22]  Haizhou Li,et al.  Spoofing and countermeasures for speaker verification: A survey , 2015, Speech Commun..

[23]  Haizhou Li,et al.  Detecting Converted Speech and Natural Speech for anti-Spoofing Attack in Speaker Recognition , 2012, INTERSPEECH.

[24]  Bayya Yegnanarayana,et al.  Breathy to Tense Voice Discrimination using Zero-Time Windowing Cepstral Coefficients (ZTWCCs) , 2018, INTERSPEECH.

[25]  Anil Kumar Vuppala,et al.  Replay spoofing countermeasures using high spectro-temporal resolution features , 2019, Int. J. Speech Technol..

[26]  Kong-Aik Lee,et al.  t-DCF: a Detection Cost Function for the Tandem Assessment of Spoofing Countermeasures and Automatic Speaker Verification , 2018, Odyssey.

[27]  Aleksandr Sizov,et al.  ASVspoof 2015: the first automatic speaker verification spoofing and countermeasures challenge , 2015, INTERSPEECH.

[28]  Haizhou Li,et al.  Spoofing speech detection using high dimensional magnitude and phase features: the NTU approach for ASVspoof 2015 challenge , 2015, INTERSPEECH.

[29]  Kong-Aik Lee,et al.  The ASVspoof 2017 Challenge: Assessing the Limits of Replay Spoofing Attack Detection , 2017, INTERSPEECH.