Client Dependent GMM-SVM Models for Speaker Verification

Generative Gaussian Mixture Models (GMMs) are known to be the dominant approach for modeling speech sequences in text independent speaker verification applications because of their scalability, good performance and their ability in handling variable size sequences. On the other hand, because of their discriminative properties, models like Support Vector Machines (SVMs) usually yield better performance in static classification problems and can construct flexible decision boundaries. In this paper, we try to combine these two complementary models by using Support Vector Machines to postprocess scores obtained by the GMMs. A cross-validation method is also used in the baseline system to increase the number of client scores in the training phase, which enhances the results of the SVM models. Experiments carried out on the XM2VTS and PolyVar databases confirm the interest of this hybrid approach.

[1]  Sadaoki Furui,et al.  Recent advances in speaker recognition , 1997, Pattern Recognit. Lett..

[2]  Vladimir Cherkassky,et al.  The Nature Of Statistical Learning Theory , 1997, IEEE Trans. Neural Networks.

[3]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[4]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[5]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[6]  Samy Bengio,et al.  A comparative study of adaptation methods for speaker verification , 2002, INTERSPEECH.

[7]  Samy Bengio,et al.  Learning the decision function for speaker verification , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[8]  Sadaoki Furui,et al.  Recent Advances in Speaker Recognition (Invited Paper) , 1997, AVBPA.

[9]  Gérard Chollet,et al.  Swiss French PolyPhone and PolyVar: telephone speech databases to model inter- and intra-speaker variability , 1996 .

[10]  Jiri Matas,et al.  XM2VTSDB: The Extended M2VTS Database , 1999 .