Automatic speaker recognition as a measurement of voice imitation and conversion

Voices can be deliberately disguised by means of human imitation or voice conversion. The question arises as to what extent they can be modified by using either of both methods. In the current paper, a set of speaker identification experiments are conducted; first, analysing some prosodic features extracted from voices of professional impersonators attempting to mimic a target voice and, second, using both intragender and crossgender converted voices in a spectral-based speaker recognition system. The results obtained in the current experiments show that the identification error rate increases when testing with imitated voices, as well as when using converted voices, especially the crossgender ones.

[1]  Douglas A. Reynolds,et al.  Using prosodic and conversational features for high-performance speaker recognition: report from JHU WS'02 , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[2]  Driss Matrouf,et al.  Effect of Speech Transformation on Impostor Acceptance , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[3]  Roger W. Shuy Dialect as Evidence in Law Cases , 1990 .

[4]  Alan Mink,et al.  Multimodal Biometric Authentication Methods: A COTS Approach | NIST , 2003 .

[5]  H. Künzel Effects of voice disguise on speaking fundamental frequency , 2000 .

[6]  André Adami,et al.  Modeling prosodic differences for speaker recognition , 2007, Speech Commun..

[7]  Mireia Farrús,et al.  Jitter and shimmer measurements for speaker recognition , 2007, INTERSPEECH.

[8]  M. Wagner,et al.  Vulnerability of speaker verification to voice mimicking , 2004, Proceedings of 2004 International Symposium on Intelligent Multimedia, Video and Speech Processing, 2004..

[9]  Mats Blomberg,et al.  Vulnerability in speaker verification - a study of technical impostor techniques , 1999, EUROSPEECH.

[10]  Daniel Elenius,et al.  A comparison between human perception and a speaker verification system score of a voice imitation. , 2004 .

[11]  U. Uludag,et al.  Multimodal Biometric Authentication Methods : A COTS Approach , 2003 .

[12]  Dat Tran,et al.  Testing Voice Mimicry with the YOHO Speaker Verification Corpus , 2005, KES.

[13]  Elisabeth Zetterholm PhD Abstract. Voice Imitation. A phonetic study of perceptual illusions and acoustic success , 2003 .

[14]  J. Pittam Voice in Social Interaction: An Interdisciplinary Approach , 1994 .

[15]  Daniel Erro,et al.  A Pitch-Asynchronous Simple Method for Speech Synthesis by Diphone Concatenation using the Deterministic plus Stochastic Model , 2005 .

[16]  E. Zetterholm Same speaker - different voices. A study of one impersonator and some of his different imitations. , 2006 .

[17]  Helenca Duxans Barrobes Voice conversion applied to text-to-speech systems , 2006 .

[18]  Daniel Erro,et al.  Flexible harmonic/stochastic speech synthesis , 2007, SSW.

[19]  John Laver,et al.  Principles of Phonetics: Principles of transcription , 1994 .

[20]  Jason W. Pelecanos,et al.  Revisiting Carl Bildt's Impostor: Would a Speaker Verification System Foil Him? , 2001, AVBPA.

[21]  Jordi Luque,et al.  On the fusion of prosody, voice spectrum and face features for multimodal person verification , 2006, INTERSPEECH.

[22]  N J Lass,et al.  Effect of Vocal Disguise on Judgments of Speakers' Sex and Race , 1982, Perceptual and motor skills.

[23]  B. Atal Automatic Speaker Recognition Based on Pitch Contours , 1969 .

[24]  Javier Ortega-Garcia,et al.  Effect of voice disguise on the performance of a forensic automatic speaker recognition system , 2004, Odyssey.

[25]  Eric Moulines,et al.  Continuous probabilistic transform for voice conversion , 1998, IEEE Trans. Speech Audio Process..

[26]  Keiichi Tokuda,et al.  Imposture using synthetic speech against speaker verification based on spectrum and pitch , 2000, INTERSPEECH.

[27]  D. Markham Phonetic imitation, accent, and the learner , 1999 .

[28]  Gérard Chollet,et al.  Vocal Forgery in Forensic Sciences , 2009, e-Forensics.

[29]  Douglas A. Reynolds,et al.  The SuperSID project: exploiting high-level information for high-accuracy speaker recognition , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[30]  Antonio Bonafonte,et al.  TC-STAR: Specifications of Language Resources and Evaluation for Speech Synthesis , 2006, LREC.

[31]  Michael J. Carey,et al.  Robust prosodic features for speaker identification , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[32]  Alexander Kain,et al.  Spectral voice conversion for text-to-speech synthesis , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[33]  Mark Huckvale,et al.  How Is Individuality Expressed in Voice? An Introduction to Speech Production and Description for Speaker Classification , 2007, Speaker Classification.