Text-Independent Speaker Verification: State of the Art and Challenges

Speech is often the only available modality to recognize the identity of a person (over the telephone, the radio, in the dark,. . . ). Automatic speaker recognition has been studied for several decades. In this chapter the state of the current text-independant speaker verification research is reviewed. Basic principles of speaker recognition are first summarized. The choice of the speech features and speaker models are mostly related to the individual characteristics (variability) of the speakers' voices. Besides the speaker's variability, we are faced with other factors, such as microphone or transmission channel variabilities, that degrade the performances of speaker verification algorithms. Some of these issues are illustrated on recent NIST-2005 and 2006 speaker recognition evaluation campaigns. The field of speaker verification is also reviewed in relation to speech recognition, focusing on the usage of this new source of information. This relationship has to be seen as an important issue in the development of new services based on speaker and speech recognition. An overview of recent results in this field is given. More particularly, examples of combining baseline Gaussian Mixture Models (GMM) with high-level information extracted with data-driven speech segmentation are reported.

[1]  Jean-Luc Gauvain,et al.  Unsupervised online adaptation for speaker verification over the telephone , 2004, Odyssey.

[2]  Sachin S. Kajarekar,et al.  Class-dependent score combination for speaker recognition , 2005, INTERSPEECH.

[3]  James P. Egan,et al.  Signal detection theory and ROC analysis , 1975 .

[4]  Johan Lindberg,et al.  A comparative study of speaker verification systems using the polycost database , 1998, ICSLP.

[5]  Douglas A. Reynolds,et al.  Corpora for the evaluation of speaker recognition systems , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[6]  J.P. Eatock,et al.  A quantitative assessment of the relative speaker discriminating properties of phonemes , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[7]  Eric Chang,et al.  Comparison of discriminative training methods for speaker verification , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[8]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[9]  Douglas A. Reynolds,et al.  A Tutorial on Text-Independent Speaker Verification , 2004, EURASIP J. Adv. Signal Process..

[10]  Naftali Z. Tisby On the application of mixture AR hidden Markov models to text independent speaker recognition , 1991, IEEE Trans. Signal Process..

[11]  Douglas E. Sturim,et al.  Support vector machines using GMM supervectors for speaker verification , 2006, IEEE Signal Processing Letters.

[12]  Sadaoki Furui,et al.  Concatenated phoneme models for text-variable speaker recognition , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13]  Barbara Peskin,et al.  Text-constrained speaker recognition on a text-independent task , 2004, Odyssey.

[14]  J. Makhoul,et al.  Linear prediction: A tutorial review , 1975, Proceedings of the IEEE.

[15]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  Jean-François Bonastre,et al.  ALIZE, a free toolkit for speaker recognition , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[17]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[18]  R. Bracewell The Fourier Transform and Its Applications , 1966 .

[19]  Gérard Chollet,et al.  Text-independent speaker verification using automatically labelled acoustic segments , 1998, ICSLP.

[20]  Joseph P. Campbell,et al.  Phonetic speaker recognition , 2001, Conference Record of Thirty-Fifth Asilomar Conference on Signals, Systems and Computers (Cat.No.01CH37256).

[21]  Joseph P. Campbell,et al.  Phonetic, idiolectal and acoustic speaker recognition , 2001, Odyssey.

[22]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[23]  Johan de Veth,et al.  The use of broad phonetic class models in speaker recognition , 1998, ICSLP.

[24]  Jean-Luc Gauvain,et al.  Feature and score normalization for speaker verification of cellular data , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[25]  Roland Auckenthaler,et al.  Improving a GMM speaker verification system by phonetic weighting , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[26]  Alexander J. Smola,et al.  Learning with Kernels: support vector machines, regularization, optimization, and beyond , 2001, Adaptive computation and machine learning series.

[27]  Jesper Ø. Olsen A two-stage procedure for phone based speaker verification , 1997, Pattern Recognit. Lett..

[28]  Larry P. Heck,et al.  Modeling dynamic prosodic variation for speaker verification , 1998, ICSLP.

[29]  Sridha Sridharan,et al.  Gaussian mixture modelling of broad phonetic and syllabic events for text-independent speaker verification , 2005, INTERSPEECH.

[30]  Douglas A. Reynolds,et al.  Fusing high- and low-level features for speaker recognition , 2003, INTERSPEECH.

[31]  Frédéric Bimbot,et al.  Techniques for a priori decision threshold estimation in speaker verification , 1998 .

[32]  Douglas E. Sturim,et al.  Speaker verification using text-constrained Gaussian Mixture Models , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[33]  Frédéric Bimbot,et al.  A Monte-Carlo method for score normalization in Automatic Speaker Verification using Kullback-Leibler distances , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[34]  Lou Boves,et al.  Local Normalization and Delayed Decision Making in Speaker Detection and Tracking , 2000, Digit. Signal Process..

[35]  Steve Renals,et al.  SVMSVM: support vector machine speaker verification methodology , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[36]  Sadaoki Furui,et al.  Comparison of speaker recognition methods using statistical features and dynamic features , 1981 .

[37]  Alvin F. Martin,et al.  The DET curve in assessment of detection task performance , 1997, EUROSPEECH.

[38]  Ramesh A. Gopinath,et al.  Short-time Gaussianization for robust speaker verification , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[39]  Jirí Navrátil,et al.  The awe and mystery of t-norm , 2003, INTERSPEECH.

[40]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[41]  Herbert Gish,et al.  Speaker identification via support vector classifiers , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[42]  Steve Renals,et al.  Speaker verification using sequence discriminant support vector machines , 2005, IEEE Transactions on Speech and Audio Processing.

[43]  Mathini Sellathurai,et al.  Low-complexity iterative method of equalization for single carrier with cyclic prefix in doubly selective channels , 2006, IEEE Signal Processing Letters.

[44]  William M. Campbell,et al.  Phonetic Speaker Recognition with Support Vector Machines , 2003, NIPS.

[45]  William M. Campbell,et al.  Channel compensation for SVM speaker recognition , 2004, Odyssey.

[46]  J.M. Naik,et al.  Speaker verification: a tutorial , 1990, IEEE Communications Magazine.

[47]  Asmaa El Hannani,et al.  Segmental Scores Fusion for ALISP-Based GMM Text-Independent Speaker Verification , 2004, Summer School on Neural Networks.

[48]  Patrick Kenny,et al.  Experiments in speaker verification using factor analysis likelihood ratios , 2004, Odyssey.

[49]  Patrick Kenny,et al.  Disentangling speaker and channel effects in speaker verification , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[50]  Michael J. Carey,et al.  Discriminative phonemes for speaker identification , 1994, ICSLP.

[51]  Julian Fiérrez,et al.  Support vector machine fusion of idiolectal and acoustic speaker information in Spanish conversational speech , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[52]  Guillaume Gravier,et al.  Overview of the 2000-2001 ELISA Consortium research activities , 2001, Odyssey.

[53]  G. Chollet,et al.  The 1st BioSecure Residential Workshop , 2005, ISPA 2005. Proceedings of the 4th International Symposium on Image and Signal Processing and Analysis, 2005..

[54]  Douglas A. Reynolds,et al.  Experimental evaluation of features for robust speaker identification , 1994, IEEE Trans. Speech Audio Process..

[55]  Larry P. Heck,et al.  Phonetic class-based speaker verification , 2003, INTERSPEECH.

[56]  A. Oppenheim,et al.  Homomorphic analysis of speech , 1968 .

[57]  J. E. Porter,et al.  Normalizations and selection of speech segments for speaker recognition scoring , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[58]  Andreas Stolcke,et al.  Improved phonetic speaker recognition using lattice decoding , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[59]  Doroteo Torre Toledano,et al.  Using Data-driven and Phonetic Units for Speaker Verification , 2006, 2006 IEEE Odyssey - The Speaker and Language Recognition Workshop.

[60]  Roland Auckenthaler,et al.  Score Normalization for Text-Independent Speaker Verification Systems , 2000, Digit. Signal Process..

[61]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[62]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[63]  Chin-Hui Lee,et al.  Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..

[64]  George R. Doddington,et al.  Speaker recognition based on idiolectal differences between speakers , 2001, INTERSPEECH.

[65]  Yuval Bistritz,et al.  Speaker verification using phoneme-adapted Gaussian Mixture Models , 2002, 2002 11th European Signal Processing Conference.

[66]  Hynek Hermansky,et al.  Speaker verification based on broad phonetic categories , 2001, Odyssey.

[67]  Gérard Chollet,et al.  Support Vector Gmms for Speaker Verification , 2006, 2006 IEEE Odyssey - The Speaker and Language Recognition Workshop.

[68]  Sridha Sridharan,et al.  Feature warping for robust speaker verification , 2001, Odyssey.

[69]  Douglas A. Reynolds,et al.  Modeling prosodic dynamics for speaker recognition , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[70]  Douglas A. Reynolds,et al.  Conditional pronunciation modeling in speaker detection , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[71]  Hynek Hermansky,et al.  RASTA processing of speech , 1994, IEEE Trans. Speech Audio Process..

[72]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[73]  Douglas A. Reynolds,et al.  Comparison of background normalization methods for text-independent speaker verification , 1997, EUROSPEECH.

[74]  Moshe Koppel,et al.  Enhanced Fusion Methods for Speaker Verification , 2004 .

[75]  Jay M. Naik,et al.  A hybrid HMM-MLP speaker verification algorithm for telephone speech , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[76]  Gunnar Fant,et al.  Acoustic Theory Of Speech Production , 1960 .

[77]  Gérard Chollet,et al.  Segmental Approaches for Automatic Speaker Verification , 2000, Digit. Signal Process..

[78]  Douglas A. Reynolds,et al.  A Gaussian mixture modeling approach to text-independent speaker identification , 1992 .

[79]  Jesper Ø. Olsen A Two Stage Procedure for Phone Based Speaker Verfication , 1997, AVBPA.

[80]  Asmaa El Hannani,et al.  Exploiting High-Level Information Provided by ALISP in Speaker Recognition , 2005, NOLISP.

[81]  Gérard Chollet,et al.  Combining GMM's with suport vector machines for text-independent speaker verification , 2001, INTERSPEECH.

[82]  Dominique Genoud,et al.  An overview of the CAVE project research activities in speaker verification , 2000, Speech Commun..

[83]  Gérard Chollet,et al.  Data-driven speech segmentation for language identification and speaker verification , 2003, NOLISP.

[84]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[85]  Douglas A. Reynolds,et al.  The SuperSID project: exploiting high-level information for high-accuracy speaker recognition , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[86]  Jean Hennebert,et al.  Text-prompted speaker verification experiments with phoneme specific MLPs , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[87]  Xin Dong,et al.  Speaker recognition using continuous density support vector machines , 2001 .

[88]  Biing-Hwang Juang,et al.  The use of cohort normalized scores for speaker verification , 1992, ICSLP.

[89]  Asmaa El Hannani,et al.  Improving Speaker Verification Using ALISP-Based Specific GMMs , 2005, AVBPA.

[90]  Joseph Picone,et al.  Signal modeling techniques in speech recognition , 1993, Proc. IEEE.

[91]  Eric G. Hansen,et al.  Speaker recognition using phoneme-specific GMMs , 2004, Odyssey.

[92]  Olli Viikki,et al.  Cepstral domain segmental feature vector normalization for noise robust speech recognition , 1998, Speech Commun..

[93]  S. Furui,et al.  Cepstral analysis technique for automatic speaker verification , 1981 .