40 Years of Progress in Automatic Speaker Recognition

Research in automatic speaker recognition has now spanned four decades. This paper surveys the major themes and advances made in the past 40 years of research so as to provide a technological perspective and an appreciation of the fundamental progress that has been accomplished in this important area of speech-based human biometrics. Although many techniques have been developed, many challenges have yet to be overcome before we can achieve the ultimate goal of creating human-like machines. Such a machine needs to be able to deliver satisfactory performance under a broad range of operating conditions. A much greater understanding of the human speech process is still required before automatic speaker recognition systems can approach human performance.

[1]  G. W. Hughes,et al.  Talker differences as they appear in correlation matrices of continuous speech spectra. , 1974, The Journal of the Acoustical Society of America.

[2]  A. B. Poritz,et al.  Linear predictive hidden Markov models and the speech signal , 1982, ICASSP.

[3]  Francine Chen,et al.  Segmentation of speech using speaker identification , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[4]  Driss Matrouf,et al.  Confidence measure based unsupervised target model adaptation for speaker verification , 2007, INTERSPEECH.

[5]  B. S. Atal Text‐Independent Speaker Recognition , 1972 .

[6]  M. Mathews,et al.  Talker‐Recognition Procedure Based on Analysis of Variance , 1963 .

[7]  Douglas A. Reynolds,et al.  A Tutorial on Text-Independent Speaker Verification , 2004, EURASIP J. Adv. Signal Process..

[8]  Kiyohiro Shikano,et al.  Recognition of noisy speech by composition of hidden Markov models , 1993, EUROSPEECH.

[9]  Sridha Sridharan,et al.  A comparison of session variability compensation techniques for SVM-based speaker recognition , 2007, INTERSPEECH.

[10]  S. Pruzansky Pattern‐Matching Procedure for Automatic Talker Recognition , 1963 .

[11]  George R. Doddington,et al.  Speaker verification over long distance telephone lines , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[12]  Sadaoki Furui,et al.  Speaker recognition using HMM composition in noisy environments , 1996, Comput. Speech Lang..

[13]  Douglas A. Reynolds,et al.  Text independent speaker identification using automatic acoustic segmentation , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[14]  Herbert Gish,et al.  Segregation of speakers for speech recognition and speaker identification , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[15]  Naftali Z. Tisby On the application of mixture AR hidden Markov models to text independent speaker recognition , 1991, IEEE Trans. Signal Process..

[16]  William M. Campbell,et al.  High-level speaker verification with support vector machines , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[17]  W. Endres,et al.  Voice spectrograms as a function of age, voice disguise, and voice imitation. , 1971, The Journal of the Acoustical Society of America.

[18]  George R. Doddington,et al.  Speaker recognition based on idiolectal differences between speakers , 2001, INTERSPEECH.

[19]  J. E. Dammann,et al.  Experimental Studies in Speaker Verification, Using an Adaptive System , 1966 .

[20]  Aaron E. Rosenberg,et al.  New techniques for automatic speaker verification , 1975 .

[21]  Sadaoki Furui,et al.  Comparison of text-independent speaker recognition methods using VQ-distortion and discrete/continuous HMMs , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[22]  William M. Campbell,et al.  Support vector machines for speaker and language recognition , 2006, Comput. Speech Lang..

[23]  M. V. Mathews,et al.  Statistical techniques for talker identification , 1971 .

[24]  Mark J. F. Gales,et al.  HMM recognition in noise using parallel model combination , 1993, EUROSPEECH.

[25]  Aaron E. Rosenberg,et al.  Evaluation of a vector quantization talker recognition system in text independent and text dependent modes , 1987 .

[26]  M. Sambur Speaker Recognition and Verification using Linear Prediction Analysis , 1973 .

[27]  G. Doddington A Method or Speaker Verification , 1971 .

[28]  S. Furui,et al.  Cepstral analysis technique for automatic speaker verification , 1981 .

[29]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[30]  Sadaoki Furui,et al.  Recent advances in speaker recognition , 1997, Pattern Recognit. Lett..

[31]  Sun-Yuan Kung,et al.  A two-level fusion approach to multimodal biometric verification , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[32]  H. Gish,et al.  An unsupervised, sequential learning algorithm for the segmentation of speech waveforms with multiple speakers , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[33]  Biing-Hwang Juang,et al.  Speaker recognition based on source coding approaches , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[34]  B. Atal Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. , 1974, The Journal of the Acoustical Society of America.

[35]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[36]  Sadaoki Furui,et al.  Text-independent speaker recognition using vocal tract and pitch information , 1990, ICSLP.

[37]  Douglas A. Reynolds,et al.  Integrated models of signal and background with application to speaker identification in noise , 1994, IEEE Trans. Speech Audio Process..

[38]  Lawrence G. Bahler,et al.  Speaker verification using randomized phrase prompting , 1991, Digit. Signal Process..

[39]  Sadaoki Furui,et al.  Concatenated phoneme models for text-variable speaker recognition , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[40]  Andreas Stolcke,et al.  MLLR transforms as features in speaker recognition , 2005, INTERSPEECH.

[41]  Sadaoki Furui,et al.  Recent Advances in Speaker Recognition (Invited Paper) , 1997, AVBPA.

[42]  Sadaoki Furui,et al.  Fifty years of progress in speech and speaker recognition , 2004 .