In this paper, we propose a new score normalization method for text-independent speaker verification using GMM (Gaussian Mixture Model). In the proposed method, cohort model is designed as virtual speaker model based on the similarity of local acoustic information between the reference speaker and other customers. The similarity is determined using statistical distance between model components such as the Gaussian distributions. Therefore, synthesized cohort model is statistically close to the reference speaker model, and can provide an effective normalizing score for various observed measurements. The experimental results using telephone speech of 60 speakers showed that the proposed method is superior to the typical methods with cohort speaker model or pooled model. Equal Error Rate (EER) when using common posteriori-defined threshold value for every speakers was drastically reduced from 3.82 % (for the conventional normalization with cohort speaker model) or 10.3 % (for normalization with pooled model) to 2.50 % (for the proposed method) when cohort size is equal to three.
[1]
Sadaoki Furui,et al.
Concatenated phoneme models for text-variable speaker recognition
,
1993,
1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[2]
Kohji Fukunaga,et al.
Introduction to Statistical Pattern Recognition-Second Edition
,
1990
.
[3]
Biing-Hwang Juang,et al.
The use of cohort normalized scores for speaker verification
,
1992,
ICSLP.
[4]
Jun-ichi Takahashi,et al.
A new cohort normalization using local acoustic information for speaker verification
,
1999,
1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).
[5]
Lawrence G. Bahler,et al.
Speaker verification using randomized phrase prompting
,
1991,
Digit. Signal Process..
[6]
Chin-Hui Lee,et al.
Speaker verification using normalized log-likelihood score
,
1996,
IEEE Trans. Speech Audio Process..