Text-independent Speaker Verification

In this chapter, an overview of text-independent speaker verification is given first. Then, recent developments needed to reach state-of-the-art performances using low-level (acoustic) features as well as how to use complementary high-level information, are presented. The most relevant speaker verification evaluation campaigns and databases are also summarized. The BioSecure benchmarking framework for speaker verification using open-source state-of-the-art algorithms, well-known databases, and reference protocols is presented after. It is also shown how to reach state-of-the-art performances using open-source software with a case study example on the National Institute of Standards and Technology 2005 Speaker Evaluation data (NIST’2005 SRE). The examples of key factors influencing the performances of speaker verification experiments on the NIST’2005 evaluation data are grouped in three parts. The first set of experiments is related to the importance of front-end processing and data selection to fine-tune the acoustic Gaussian Mixture systems. The second set of experiments illustrates the importance of speaker and session variability modeling methods in order to cope with mismatched enrollment/test conditions. The third series of experiments demonstrates the usefulness of data-driven speech segmentation methods for extracting complementary high-level information. The chapter ends with conclusions and perspectives.

[1]  R. Gray,et al.  Vector quantization , 1984, IEEE ASSP Magazine.

[2]  Jean-Luc Gauvain,et al.  Unsupervised online adaptation for speaker verification over the telephone , 2004, Odyssey.

[3]  Guillaume Gravier,et al.  Overview of the 2000-2001 ELISA Consortium research activities , 2001, Odyssey.

[4]  Eric Chang,et al.  Comparison of discriminative training methods for speaker verification , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[5]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[6]  Chin-Hui Lee,et al.  Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..

[7]  George R. Doddington,et al.  Speaker recognition based on idiolectal differences between speakers , 2001, INTERSPEECH.

[8]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[9]  R. Bracewell The Fourier Transform and Its Applications , 1966 .

[10]  Douglas A. Reynolds,et al.  Speaker identification and verification using Gaussian mixture speaker models , 1995, Speech Commun..

[11]  Gérard Chollet,et al.  Text-independent speaker verification using automatically labelled acoustic segments , 1998, ICSLP.

[12]  Johan de Veth,et al.  The use of broad phonetic class models in speaker recognition , 1998, ICSLP.

[13]  Jean-Luc Gauvain,et al.  Feature and score normalization for speaker verification of cellular data , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[14]  Doroteo Torre Toledano,et al.  Using Data-driven and Phonetic Units for Speaker Verification , 2006, 2006 IEEE Odyssey - The Speaker and Language Recognition Workshop.

[15]  Douglas E. Sturim,et al.  Support vector machines using GMM supervectors for speaker verification , 2006, IEEE Signal Processing Letters.

[16]  Barbara Peskin,et al.  Text-constrained speaker recognition on a text-independent task , 2004, Odyssey.

[17]  Rolf Ingold,et al.  MYIDEA - MULTIMODAL BIOMETRICS DATABASE, DESCRIPTION OF ACQUISITION PROTOCOLS , 2005 .

[18]  Bruce Schneier,et al.  Inside risks: the uses and abuses of biometrics , 1999, CACM.

[19]  Frédéric Bimbot,et al.  A Monte-Carlo method for score normalization in Automatic Speaker Verification using Kullback-Leibler distances , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[20]  Eric G. Hansen,et al.  Speaker recognition using phoneme-specific GMMs , 2004, Odyssey.

[21]  Olli Viikki,et al.  Cepstral domain segmental feature vector normalization for noise robust speech recognition , 1998, Speech Commun..

[22]  Lou Boves,et al.  Local Normalization and Delayed Decision Making in Speaker Detection and Tracking , 2000, Digit. Signal Process..

[23]  Patrick Kenny,et al.  Experiments in speaker verification using factor analysis likelihood ratios , 2004, Odyssey.

[24]  Patrick Kenny,et al.  Disentangling speaker and channel effects in speaker verification , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[25]  Michael J. Carey,et al.  Discriminative phonemes for speaker identification , 1994, ICSLP.

[26]  S. Furui,et al.  Cepstral analysis technique for automatic speaker verification , 1981 .

[27]  Julian Fiérrez,et al.  Support vector machine fusion of idiolectal and acoustic speaker information in Spanish conversational speech , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[28]  William M. Campbell,et al.  Support vector machines for speaker verification and identification , 2000, Neural Networks for Signal Processing X. Proceedings of the 2000 IEEE Signal Processing Society Workshop (Cat. No.00TH8501).

[29]  Naftali Z. Tisby On the application of mixture AR hidden Markov models to text independent speaker recognition , 1991, IEEE Trans. Signal Process..

[30]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[31]  Douglas E. Sturim,et al.  Speaker verification using text-constrained Gaussian Mixture Models , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[32]  Douglas A. Reynolds,et al.  The SuperSID project: exploiting high-level information for high-accuracy speaker recognition , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[33]  Jean Hennebert,et al.  Text-prompted speaker verification experiments with phoneme specific MLPs , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[34]  Xin Dong,et al.  Speaker recognition using continuous density support vector machines , 2001 .

[35]  Biing-Hwang Juang,et al.  The use of cohort normalized scores for speaker verification , 1992, ICSLP.

[36]  Asmaa El Hannani,et al.  Improving Speaker Verification Using ALISP-Based Specific GMMs , 2005, AVBPA.

[37]  Gérard Chollet,et al.  Support Vector Gmms for Speaker Verification , 2006, 2006 IEEE Odyssey - The Speaker and Language Recognition Workshop.

[38]  Sridha Sridharan,et al.  Feature warping for robust speaker verification , 2001, Odyssey.

[39]  D. Petrovska-Delacretaz,et al.  Comparing Data-driven and Phonetic N-gram Systems for Text-Independent Speaker Verification , 2007, 2007 First IEEE International Conference on Biometrics: Theory, Applications, and Systems.

[40]  Ramesh A. Gopinath,et al.  Short-time Gaussianization for robust speaker verification , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[41]  Douglas E. Sturim,et al.  The 2004 MIT Lincoln Laboratory speaker recognition system , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[42]  Douglas A. Reynolds,et al.  Modeling prosodic dynamics for speaker recognition , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[43]  Andreas Stolcke,et al.  Improved phonetic speaker recognition using lattice decoding , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[44]  Douglas A. Reynolds,et al.  Conditional pronunciation modeling in speaker detection , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[45]  Steve Renals,et al.  Speaker verification using sequence discriminant support vector machines , 2005, IEEE Transactions on Speech and Audio Processing.

[46]  Frédéric Bimbot,et al.  Techniques for a priori decision threshold estimation in speaker verification , 1998 .

[47]  Yuval Bistritz,et al.  Speaker verification using phoneme-adapted Gaussian Mixture Models , 2002, 2002 11th European Signal Processing Conference.

[48]  Gérard Chollet,et al.  BIOMET: A Multimodal Person Authentication Database Including Face, Voice, Fingerprint, Hand and Signature Modalities , 2003, AVBPA.

[49]  J. Makhoul,et al.  Linear prediction: A tutorial review , 1975, Proceedings of the IEEE.

[50]  Nikos Fakotakis,et al.  Text-Independent Speaker Verification: The WCL-1 System , 2003, TSD.

[51]  Driss Matrouf,et al.  State-of-the-Art Performance in Text-Independent Speaker Verification Through Open-Source Software , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[52]  Tomi Kinnunen,et al.  Spectral Features for Automatic Text-Independent Speaker Recognition , 2003 .

[53]  Y. S. Moon,et al.  Fixed-point GMM-based speaker verification over mobile embedded system , 2003, WBMA '03.

[54]  Joseph P. Campbell,et al.  Phonetic, idiolectal and acoustic speaker recognition , 2001, Odyssey.

[55]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[56]  J. E. Porter,et al.  Normalizations and selection of speech segments for speaker recognition scoring , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[57]  Roland Auckenthaler,et al.  Score Normalization for Text-Independent Speaker Verification Systems , 2000, Digit. Signal Process..

[58]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[59]  Alvin F. Martin,et al.  The DET curve in assessment of detection task performance , 1997, EUROSPEECH.

[60]  Jean-Philippe Thiran,et al.  The BANCA Database and Evaluation Protocol , 2003, AVBPA.

[61]  Judith A. Markowitz Voice biometrics , 2000, CACM.

[62]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[63]  G. Chollet,et al.  The 1st BioSecure Residential Workshop , 2005, ISPA 2005. Proceedings of the 4th International Symposium on Image and Signal Processing and Analysis, 2005..

[64]  Sridha Sridharan,et al.  Gaussian mixture modelling of broad phonetic and syllabic events for text-independent speaker verification , 2005, INTERSPEECH.

[65]  Asmaa El Hannani,et al.  Segmental Scores Fusion for ALISP-Based GMM Text-Independent Speaker Verification , 2004, Summer School on Neural Networks.

[66]  Håkan Melin,et al.  Text Dependent Speaker Verification with a Hybrid HMM/ANN System , 2003 .

[67]  A. Oppenheim,et al.  Homomorphic analysis of speech , 1968 .

[68]  R.D. Zilca Text-independent speaker verification using covariance modeling , 2001, IEEE Signal Processing Letters.

[69]  L. Rabiner,et al.  An introduction to hidden Markov models , 1986, IEEE ASSP Magazine.

[70]  Jeff A. Bilmes,et al.  A gentle tutorial of the em algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models , 1998 .

[71]  Johan Lindberg,et al.  A comparative study of speaker verification systems using the polycost database , 1998, ICSLP.

[72]  J.P. Eatock,et al.  A quantitative assessment of the relative speaker discriminating properties of phonemes , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[73]  Douglas A. Reynolds,et al.  Fusing high- and low-level features for speaker recognition , 2003, INTERSPEECH.

[74]  James P. Braselton,et al.  Technology and Applications , 1997 .

[75]  Hynek Hermansky,et al.  Speaker verification based on broad phonetic categories , 2001, Odyssey.

[76]  Steve Renals,et al.  SVMSVM: support vector machine speaker verification methodology , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[77]  Sadaoki Furui,et al.  Comparison of speaker recognition methods using statistical features and dynamic features , 1981 .

[78]  Douglas A. Reynolds,et al.  Experimental evaluation of features for robust speaker identification , 1994, IEEE Trans. Speech Audio Process..

[79]  Larry P. Heck,et al.  Phonetic class-based speaker verification , 2003, INTERSPEECH.

[80]  William M. Campbell,et al.  Phonetic Speaker Recognition with Support Vector Machines , 2003, NIPS.

[81]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[82]  William M. Campbell,et al.  Channel compensation for SVM speaker recognition , 2004, Odyssey.

[83]  Sadaoki Furui,et al.  Concatenated phoneme models for text-variable speaker recognition , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[84]  Dominique Genoud,et al.  An overview of the CAVE project research activities in speaker verification , 2000, Speech Commun..

[85]  Gérard Chollet,et al.  Data-driven speech segmentation for language identification and speaker verification , 2003, NOLISP.

[86]  Jason W. Pelecanos,et al.  Real time robust speech detection for text independent speaker recognition , 2004, Odyssey.

[87]  Asmaa El Hannani Text-independent speaker verification based on high-level information extracted with data-driven methods , 2007 .

[88]  Alexander J. Smola,et al.  Learning with Kernels: support vector machines, regularization, optimization, and beyond , 2001, Adaptive computation and machine learning series.

[89]  Andreas Stolcke,et al.  Modeling duration patterns for speaker recognition , 2003, INTERSPEECH.

[90]  Markus Saers Speaker verification - an overview , 2007 .

[91]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[92]  James P. Egan,et al.  Signal detection theory and ROC analysis , 1975 .

[93]  Douglas A. Reynolds,et al.  Corpora for the evaluation of speaker recognition systems , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[94]  Roland Auckenthaler,et al.  Improving a GMM speaker verification system by phonetic weighting , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[95]  Douglas A. Reynolds,et al.  A Tutorial on Text-Independent Speaker Verification , 2004, EURASIP J. Adv. Signal Process..

[96]  Gunnar Fant,et al.  Acoustic Theory Of Speech Production , 1960 .

[97]  Gérard Chollet,et al.  Segmental Approaches for Automatic Speaker Verification , 2000, Digit. Signal Process..

[98]  Douglas A. Reynolds,et al.  A Gaussian mixture modeling approach to text-independent speaker identification , 1992 .

[99]  Jesper Ø. Olsen A Two Stage Procedure for Phone Based Speaker Verfication , 1997, AVBPA.

[100]  Asmaa El Hannani,et al.  Exploiting High-Level Information Provided by ALISP in Speaker Recognition , 2005, NOLISP.

[101]  Gérard Chollet,et al.  Combining GMM's with suport vector machines for text-independent speaker verification , 2001, INTERSPEECH.

[102]  Jesper Ø. Olsen A two-stage procedure for phone based speaker verification , 1997, Pattern Recognit. Lett..

[103]  Alvin F. Martin,et al.  NIST Speaker Recognition Evaluation Chronicles - Part 2 , 2006, 2006 IEEE Odyssey - The Speaker and Language Recognition Workshop.

[104]  Gérard Chollet,et al.  Text-Independent Speaker Verification: State of the Art and Challenges , 2005, WNSP.

[105]  Larry P. Heck,et al.  Modeling dynamic prosodic variation for speaker verification , 1998, ICSLP.

[106]  Jirí Navrátil,et al.  The awe and mystery of t-norm , 2003, INTERSPEECH.

[107]  Alan V. Oppenheim,et al.  Discrete-Time Signal Pro-cessing , 1989 .

[108]  Joseph Picone,et al.  Signal modeling techniques in speech recognition , 1993, Proc. IEEE.

[109]  Hynek Hermansky,et al.  RASTA processing of speech , 1994, IEEE Trans. Speech Audio Process..

[110]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[111]  Douglas A. Reynolds,et al.  Comparison of background normalization methods for text-independent speaker verification , 1997, EUROSPEECH.

[112]  Jay M. Naik,et al.  A hybrid HMM-MLP speaker verification algorithm for telephone speech , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[113]  Asmaa El Hannani,et al.  Fusing acoustic, phonetic and data-driven systems for text-independent speaker verification , 2007, INTERSPEECH.