Identifying in-set and out-of-set speakers using neighborhood information

We study the problem of identifying in-set and out-of-set speakers. The goal is to identify whether an unknown input speaker belongs to either a group of in-set speakers or an unseen out-of-set group. A state-of-the-art GMM classifier, with universal background model (UBM) and standard likelihood ratio test, is used as our baseline system. We propose an alternative hypothesis testing method that employs neighborhood information with respect to each in-set speaker model in the model space based on the Kullback-Leibier divergence. The Bayes factor is used in the verification stage (accept/reject hypothesis). We evaluate the proposed procedure on a clean CORPUS 1 set, and a noisy CORPUS 2 set which contains session-to-session variability. Experiments show an improvement in equal error rate for the system even when in-set speaker models are acoustically close in the model space, and as the size of the in-set speaker group increases.

[1]  Chin-Hui Lee,et al.  A new approach to utterance verification based on neighborhood information in model space , 2003, IEEE Trans. Speech Audio Process..

[2]  Yifan Gong Noise-robust open-set speaker recognition using noise-dependent Gaussian mixture classifier , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  Li Deng,et al.  A Bayesian approach to the verification problem: applications to speaker verification , 2001, IEEE Trans. Speech Audio Process..

[4]  Qixiu Hu,et al.  Open set text-independent speaker recognition based on set-score pattern classification , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[5]  Frédéric Bimbot,et al.  D-MAP: a distance-normalized MAP estimation of speaker models for automatic speaker verification , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[6]  Aladdin M. Ariyaeeinia,et al.  Score normalisation applied to open-set, text-independent speaker identification , 2003, INTERSPEECH.

[7]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..