Comparison of two kinds of speaker location representation for SVM-based speaker verification

In anchor modeling, each speaker utterance is represented as a fixed-length location vector in the space of reference speakers by scoring against a set of anchor models. SVMbased speaker verification systems using the anchor location representation have been studied in previously reported work with promising results. In this paper, linear combination weights in reference speaker weighting (RSW) adaptation are explored as an alternative kind of speaker location representation. And this kind of RSW location representation is compared with the anchor location representation in various speaker verification tasks on the 2006 NIST Speaker Recognition Evaluation corpus. Experimental results indicate that with long utterances for reliable maximum likelihood estimation in RSW, the RSW location representation leads to better speaker verification performance than the anchor location; while the latter is more effective for verification of short utterances in high-dimensional representation space.

[1]  Douglas E. Sturim,et al.  SVM Based Speaker Verification using a GMM Supervector Kernel and NAP Variability Compensation , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[2]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[3]  Man-Wai Mak,et al.  A Comparison of Various Adaptation Methods for Speaker Verification With Limited Enrollment Data , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[4]  Yuan Dong,et al.  Multigrained Model Adaptation With Map and Reference Speaker Weighting For Text Independent Speaker Verification , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[5]  William M. Campbell,et al.  Generalized linear discriminant sequence kernels for speaker recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  Douglas E. Sturim,et al.  Speaker indexing in large audio databases using anchor models , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[7]  Steve Renals,et al.  Speaker verification using sequence discriminant support vector machines , 2005, IEEE Transactions on Speech and Audio Processing.

[8]  Roland Kuhn,et al.  Rapid speaker adaptation in eigenvoice space , 2000, IEEE Trans. Speech Audio Process..

[9]  Timothy J. Hazen The use of speaker correlation information for automatic speech recognition , 1998 .

[10]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[11]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[12]  Samy Bengio,et al.  SVMTorch: Support Vector Machines for Large-Scale Regression Problems , 2001, J. Mach. Learn. Res..

[13]  李幼升,et al.  Ph , 1989 .

[14]  Delphine Charlet,et al.  Speaker recognition by location in the space of reference speakers , 2006, Speech Commun..

[15]  Andreas Stolcke,et al.  Generalized Linear Kernels for One-Versus-All Classification: Application to Speaker Recognition , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[16]  Yuan Dong,et al.  Svm-Based Speaker Verification by Location in the Space of Reference Speakers , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[17]  Delphine Charlet,et al.  Speaker Tracking by Anchor Models Using Speaker Segment Cluster Information , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.