A novel framework for text-independent speaker verification is proposed. The framework is based on a new interpretation of Universal Background Model. The UBM in our framework actually defines a transform which maps the variable length observation into a fixed dimensional supervector(supervector space). Each speech utterance is then mapped into a point in this supervector space. The similarity measure in this vector space is progressively refined via an iterative cohort modeling scheme. The experiments on NIST 2002 corpus show the effectiveness of this new framework. Overall the EER drops from the baseline system(with TNorm) 9.21% to final improved system(without T-Norm) 8.07%. The new framework can effectively reduce the data dependence in the final output score which is clearly indicated in the second sets of experiments. The EER after T-Norm of final system marginally increases by relatively 1.73% compared to the EER of baseline system drops 16.12% relatively after T-Norm. Also, the relative improvement of DCF after T-Norm is marginal for the final improved system (2.47%) compared to 33.68% in baseline system. It clear shows that the iterative cohort modeling effectively reduce the data dependence of the final scores, so that T-Norm will not further improve the system performance. Also, the performance of novel frame clearly increases as the iteration grows which suggest that the framework progressively refine the similarity measure on the supervector space with the iterative cohort modeling. Index Terms: speaker verification, utterance transform, iterative cohort modeling.
[1]
References
,
1971
.
[2]
Tomi Kinnunen,et al.
Efficient online cohort selection method for speaker verification
,
2004,
INTERSPEECH.
[3]
David Haussler,et al.
Exploiting Generative Models in Discriminative Classifiers
,
1998,
NIPS.
[4]
Sridha Sridharan,et al.
Feature warping for robust speaker verification
,
2001,
Odyssey.
[5]
Chin-Hui Lee,et al.
Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains
,
1994,
IEEE Trans. Speech Audio Process..
[6]
Eric H. C. Choi,et al.
Successive cohort selection (SCS) for text-independent speaker verification
,
2000,
INTERSPEECH.
[7]
Douglas A. Reynolds,et al.
Speaker Verification Using Adapted Gaussian Mixture Models
,
2000,
Digit. Signal Process..
[8]
Steve Renals,et al.
Speaker verification using sequence discriminant support vector machines
,
2005,
IEEE Transactions on Speech and Audio Processing.
[9]
D. Rubin,et al.
Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper
,
1977
.
[10]
Thomas S. Huang,et al.
Robust Local Scoring Function for Text-Independent Speaker Verification
,
2006,
18th International Conference on Pattern Recognition (ICPR'06).
[11]
Roland Auckenthaler,et al.
Score Normalization for Text-Independent Speaker Verification Systems
,
2000,
Digit. Signal Process..