Evaluation of EMD-Based Speaker Recognition Using ISCSLP2006 Chinese Speaker Recognition Evaluation Corpus

In this paper, we present the evaluation results of our proposed text-independent speaker recognition method based on the Earth Mover’s Distance (EMD) using ISCSLP2006 Chinese speaker recognition evaluation corpus developed by the Chinese Corpus Consortium (CCC). The EMD based speaker recognition (EMD-SR) was originally designed to apply to a distributed speaker identification system, in which the feature vectors are compressed by vector quantization at a terminal and sent to a server that executes a pattern matching process. In this structure, we had to train speaker models using quantized data, so that we utilized a non-parametric speaker model and EMD. From the experimental results on a Japanese speech corpus, EMD-SR showed higher robustness to the quantized data than the conventional GMM technique. Moreover, it has achieved higher accuracy than the GMM even if the data were not quantized. Hence, we have taken the challenge of ISCSLP2006 speaker recognition evaluation by using EMD-SR. Since the identification tasks defined in the evaluation were on an open-set basis, we introduce a new speaker verification module in this paper. Evaluation results showed that EMD-SR achieves 99.3% Identification Correctness Rate in a closed-channel speaker identification task.

[1]  Darren Pearce,et al.  Enabling new speech driven services for mobile devices: An overview of the ETSI standards activities , 2000 .

[2]  Sun-Yuan Kung,et al.  Maximum Likelihood and Maximum a Posteriori Adaptation for Distributed Speaker Recognition Systems , 2004, ICBA.

[3]  Shingo Kuroiwa,et al.  Effects of Phoneme Type and Frequency on Distributed Speaker Identification and Verification , 2006, IEICE Trans. Inf. Syst..

[4]  Shingo Kuroiwa,et al.  Determination of threshold for speaker verification using speaker adaptation gain in likelihood during training , 2000, INTERSPEECH.

[5]  Alex Park,et al.  ASR dependent techniques for speaker identification , 2002, INTERSPEECH.

[6]  C. Tomasi The Earth Mover's Distance, Multi-Dimensional Scaling, and Color-Based Image Retrieval , 1997 .

[7]  Carlo Tomasi,et al.  The Earth Mover’s Distance , 2001 .

[8]  Shingo Kuroiwa,et al.  Prank call rejection system for home country direct service , 1996, Proceedings of IVTTA '96. Workshop on Interactive Voice Technology for Telecommunications Applications.

[9]  C. Broun,et al.  Distributed speaker recognition using the ETSI distributed speech recognition standard , 2001 .

[10]  F. Pellandini,et al.  Distributed speaker recognition using the ETSI AURORA standard , 2002 .

[11]  Shingo Kuroiwa,et al.  Nonparametric Speaker Recognition Method Using Earth Mover's Distance , 2006, IEICE Trans. Inf. Syst..

[12]  William M. Campbell,et al.  Speaker recognition and the ETSI Standard Distributed Speech Recognition Front-End , 2001, Odyssey.