An Investigation of Spoofing Speech Detection Under Additive Noise and Reverberant Conditions

Spoofing detection for automatic speaker verification (ASV), which is to discriminate between live and artificial speech, has received increasing attentions recently. However, the previous studies have been done on the clean data without significant noise. It is still not clear whether the spoofing detectors trained on clean speech can generalise well under noisy conditions. In this work, we perform an investigation of spoofing detection under additive noise and reverberant conditions. In particular, we consider five difference additive noises at three different signalto-noise ratios (SNR), and a reverberation noise with different reverberation time (RT). Our experimental results reveal that additive noises degrade the spoofing detectors trained on clean speech significantly. However, the reverberation does not hurt the performance too much.

[1]  Haizhou Li,et al.  Spoofing speech detection using high dimensional magnitude and phase features: the NTU approach for ASVspoof 2015 challenge , 2015, INTERSPEECH.

[2]  Haizhou Li,et al.  Detecting Converted Speech and Natural Speech for anti-Spoofing Attack in Speaker Recognition , 2012, INTERSPEECH.

[3]  Sridha Sridharan,et al.  The QUT-NOISE-SRE protocol for the evaluation of noisy speaker recognition , 2015, INTERSPEECH.

[4]  Nicholas W. D. Evans,et al.  Spoofing countermeasures to protect automatic speaker verification from voice conversion , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[5]  Simon King,et al.  Measuring a decade of progress in Text-to-Speech , 2014 .

[6]  Brett Beranek Voice biometrics: success stories, success factors and what's next , 2013 .

[7]  Sébastien Marcel,et al.  On the vulnerability of speaker verification to realistic voice spoofing , 2015, 2015 IEEE 7th International Conference on Biometrics Theory, Applications and Systems (BTAS).

[8]  Steven Furnell,et al.  Surveying the Development of Biometric User Authentication on Mobile Phones , 2015, IEEE Communications Surveys & Tutorials.

[9]  Andrzej Drygajlo,et al.  Speaker verification in noisy environments with combined spectral subtraction and missing feature theory , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[10]  Ibon Saratxaga,et al.  Evaluation of Speaker Verification Security and Detection of HMM-Based Synthetic Speech , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  Haizhou Li,et al.  A study on spoofing attack in state-of-the-art speaker verification: the telephone speech case , 2012, Proceedings of The 2012 Asia Pacific Signal and Information Processing Association Annual Summit and Conference.

[12]  Kuldip K. Paliwal,et al.  Short-time phase spectrum in speech processing: A review and some experimental results , 2007, Digit. Signal Process..

[13]  Tomoki Toda,et al.  Anti-Spoofing for Text-Independent Speaker Verification: An Initial Database, Comparison of Countermeasures, and Human Performance , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[14]  Junichi Yamagishi,et al.  Synthetic Speech Discrimination using Pitch Pattern Statistics Derived from Image Analysis , 2012, INTERSPEECH.

[15]  Aleksandr Sizov,et al.  ASVspoof 2015: the first automatic speaker verification spoofing and countermeasures challenge , 2015, INTERSPEECH.

[16]  Haizhou Li,et al.  Spoofing detection under noisy conditions: a preliminary investigation and an initial database , 2016, ArXiv.

[17]  Timo Gerkmann,et al.  STFT Phase Reconstruction in Voiced Speech for an Improved Single-Channel Speech Enhancement , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[18]  Haizhou Li,et al.  A study on replay attack and anti-spoofing for text-dependent speaker verification , 2014, Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific.

[19]  Haizhou Li,et al.  Spoofing detection from a feature representation perspective , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[20]  Mikhail Khitrov,et al.  Talking passwords: voice biometrics for data access and security , 2013 .

[21]  Nathalie Virag,et al.  Single channel speech enhancement based on masking properties of the human auditory system , 1999, IEEE Trans. Speech Audio Process..

[22]  Aleksandr Sizov,et al.  Joint Speaker Verification and Antispoofing in the $i$ -Vector Space , 2015, IEEE Transactions on Information Forensics and Security.

[23]  Jon Sánchez,et al.  Toward a Universal Synthetic Speech Spoofing Detection Using Phase Information , 2015, IEEE Transactions on Information Forensics and Security.

[24]  Steven Kay,et al.  The effects of noise on the autoregressive spectral estimator , 1979 .

[25]  Aleksandr Sizov,et al.  Spoofing detection goes noisy: An analysis of synthetic speech detection in the presence of additive noise , 2016, Speech Commun..

[26]  Bayya Yegnanarayana,et al.  Significance of group delay functions in spectrum estimation , 1992, IEEE Trans. Signal Process..

[27]  Haizhou Li,et al.  Spoofing and countermeasures for speaker verification: A survey , 2015, Speech Commun..

[28]  Herman J. M. Steeneken,et al.  Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems , 1993, Speech Commun..

[29]  David Pearce,et al.  The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions , 2000, INTERSPEECH.

[30]  Paavo Alku,et al.  Temporally Weighted Linear Prediction Features for Tackling Additive Noise in Speaker Verification , 2010, IEEE Signal Processing Letters.

[31]  Aleksandr Sizov,et al.  Introducing i-vectors for joint anti-spoofing and speaker verification , 2014, INTERSPEECH.

[32]  Haizhou Li,et al.  Synthetic speech detection using temporal modulation feature , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.