Revisiting the security of speaker verification systems against imposture using synthetic speech

In this paper, we investigate imposture using synthetic speech. Although this problem was first examined over a decade ago, dramatic improvements in both speaker verification (SV) and speech synthesis have renewed interest in this problem. We use a HMM-based speech synthesizer which creates synthetic speech for a targeted speaker through adaptation of a background model. We use two SV systems: standard GMM-UBM-based and a newer SVM-based. Our results show when the systems are tested with human speech, there are zero false acceptances and zero false rejections. However, when the systems are tested with synthesized speech, all claims for the targeted speaker are accepted while all other claims are rejected. We propose a two-step process for detection of synthesized speech in order to prevent this imposture. Overall, while SV systems have impressive accuracy, even with the proposed detector, high-quality synthetic speech will lead to an unacceptably high false acceptance rate.

[1]  Keiichi Tokuda,et al.  Imposture using synthetic speech against speaker verification based on spectrum and pitch , 2000, INTERSPEECH.

[2]  Keiichi Tokuda,et al.  Speech synthesis using HMMs with dynamic features , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[3]  Mark J. F. Gales,et al.  Combining Derivative and Parametric Kernels for Speaker Verification , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  Keiichi Tokuda,et al.  A Speech Parameter Generation Algorithm Considering Global Variance for HMM-Based Speech Synthesis , 2007, IEICE Trans. Inf. Syst..

[5]  Keiichi Tokuda,et al.  On the security of HMM-based speaker verification systems against imposture using synthetic speech , 1999, EUROSPEECH.

[6]  Tomi Kinnunen,et al.  Real-time speaker identification and verification , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  Heiga Zen,et al.  Details of the Nitech HMM-Based Speech Synthesis System for the Blizzard Challenge 2005 , 2007, IEICE Trans. Inf. Syst..

[8]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[9]  Heiga Zen,et al.  Robust Speaker-Adaptive HMM-Based Text-to-Speech Synthesis , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[10]  Sadaoki Furui,et al.  Likelihood normalization for speaker verification using a phoneme- and speaker-independent model , 1995, Speech Commun..