Speaker verification over the telephone

Speaker verification has been the subject of active research for many years, yet despite these eAorts and promising results on laboratory data, speaker verification performance over the telephone remains below that required for many applications. This experimental study aimed to quantify speaker recognition performance out of the context of any specific application, as a function of factors more-or-less acknowledged to aAect the accuracy. Some of the issues addressed are: the speaker model (Gaussian mixture models are compared with phone-based models), the influence of the amount and content of training and test data on performance; performance degradation due to model aging and how can this be counteracted by using adaptation techniques; achievable performance levels using text-dependent and text-independent recognition modes. These and other factors were addressed using a large corpus of read and spontaneous speech (over 250 hours collected from 100 target speakers and 1000 imposters) in French, designed and recorded for the purpose of this study. On these data, the lowest equal error rate is 1% for the text-dependent mode when two trials are allowed per verification attempt and with a minimum of 1.5 s of speech per trial. ” 2000 Elsevier Science B.V. All rights reserved.

[1]  G.R. Doddington,et al.  Speaker recognition—Identifying people by their voices , 1985, Proceedings of the IEEE.

[2]  Jean-Luc Gauvain,et al.  Speaker recognition with the Switchboard corpus , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  Don McAllaster,et al.  Speaker verification through large vocabulary continuous speech recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[4]  Douglas A. Reynolds,et al.  Corpora for the evaluation of speaker recognition systems , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[5]  L. Boves,et al.  Speaker recognition in telecom applications , 1998, Proceedings 1998 IEEE 4th Workshop Interactive Voice Technology for Telecommunications Applications. IVTTA '98 (Cat. No.98TH8376).

[6]  Sadaoki Furui,et al.  An Overview of Speaker Recognition Technology , 1996 .

[7]  Biing-Hwang Juang,et al.  The use of cohort normalized scores for speaker verification , 1992, ICSLP.

[8]  John J. Godfrey,et al.  Macrophone: an American English telephone speech corpus for the Polyphone project , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[9]  Jean-Luc Gauvain,et al.  Continuous Speech Recognition at LIMSI , 1992 .

[10]  Lawrence G. Bahler,et al.  Speaker verification using randomized phrase prompting , 1991, Digit. Signal Process..

[11]  J.M. Naik,et al.  Speaker verification: a tutorial , 1990, IEEE Communications Magazine.

[12]  John J. Godfrey Multilingual Speech Databases at LDC , 1994, HLT.

[13]  Maxine Eskénazi,et al.  BREF, a large vocabulary spoken corpus for French , 1991, EUROSPEECH.

[14]  A.E. Rosenberg,et al.  Automatic speaker verification: A review , 1976, Proceedings of the IEEE.

[15]  Jean-Luc Gauvain,et al.  A phone-based approach to non-linguistic speech feature identification , 1995, Comput. Speech Lang..

[16]  Jean-Luc Gauvain,et al.  Experiments with speaker verification over the telephone , 1995, EUROSPEECH.

[17]  Douglas A. Reynolds,et al.  Speaker identification and verification using Gaussian mixture speaker models , 1995, Speech Commun..

[18]  Chin-Hui Lee,et al.  Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..

[19]  Jean-Luc Gauvain,et al.  Identification of Non-Linguistic Speech Features , 1993, HLT.

[20]  Til T. Phan,et al.  Text-Independent Speaker Identification , 1999 .

[21]  Sadaoki Furui,et al.  Concatenated phoneme models for text-variable speaker recognition , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[22]  Stephen A. Dyer,et al.  Digital signal processing , 2018, 8th International Multitopic Conference, 2004. Proceedings of INMIC 2004..

[23]  Aaron E. Rosenberg,et al.  Sub-word unit talker verification using hidden Markov models , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[24]  Jean-Luc Gauvain,et al.  Identifying non-linguistic speech features , 1993, EUROSPEECH.

[25]  B.S. Atal,et al.  Automatic recognition of speakers from their voices , 1976, Proceedings of the IEEE.

[26]  John J. Godfrey,et al.  SWITCHBOARD: telephone speech corpus for research and development , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.