Speaker verification against synthetic speech

With the development of the HMM-based parametric speech synthesis algorithm, it is easy for impostors to generate the synthetic speech with specific speaker's characteristics, which is a serious threat to the state of the art speaker verification system. In this paper, we investigate the difference of Mel-cepstral (MCEP) between the natural and synthetic speech. Experiments demonstrate that we can discriminate synthetic speech from natural speech by the higher order of MCEP.

[1]  Keiichi Tokuda,et al.  Speech parameter generation algorithms for HMM-based speech synthesis , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[2]  Keiichi Tokuda,et al.  Speech synthesis using HMMs with dynamic features , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[3]  Tanja Schultz,et al.  Is voice transformation a threat to speaker identification? , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[4]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[5]  William M. Campbell,et al.  Support vector machines for speaker and language recognition , 2006, Comput. Speech Lang..

[6]  Hideki Kawahara,et al.  Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds , 1999, Speech Commun..

[7]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[8]  Keiichi Tokuda,et al.  An adaptive algorithm for mel-cepstral analysis of speech , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  Keiichi Tokuda,et al.  Imposture using synthetic speech against speaker verification based on spectrum and pitch , 2000, INTERSPEECH.