Efficient speaker identification using distributional speaker model clustering

For large population speaker identification (SI) systems, likelihood computations between an unknown speaker's test feature vectors and speaker models can be very time-consuming and detrimental to applications where fast SI is required. In this paper, we propose a method whereby speaker models are clustered using a distributional distance measure such as KL divergence during the training stage. During the testing stage, only those clusters which are likely to contain high-likelihood speaker models are searched. The proposed method reduces the speaker model search space which directly results in faster SI. Any loss in identification accuracy can be controlled by trading off speed and accuracy. This paper implements GMM-UBM based SI system with MAP adapted speaker models and the results are presented on TIMIT, NTIMIT and NIST-2002 large population speech corpora.

[1]  Tomi Kinnunen,et al.  Real-time speaker identification and verification , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Hagai Aronowitz,et al.  Efficient Speaker Recognition Using Approximated Cross Entropy (ACE) , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  Thomas Quatieri,et al.  Discrete-Time Speech Signal Processing: Principles and Practice , 2001 .

[4]  Bing Sun,et al.  Hierarchical speaker identification using speaker clustering , 2003, International Conference on Natural Language Processing and Knowledge Engineering, 2003. Proceedings. 2003.

[5]  Hsin-Min Wang,et al.  Automatic Speaker Clustering Using a Voice Characteristic Reference Space and Maximum Purity Estimation , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  P.L. De Leon,et al.  Reducing Speaker Model Search Space in Speaker Identification , 2007, 2007 Biometrics Symposium.

[7]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Hagai Aronowitz,et al.  A distance measure between GMMs based on the unscented transform and its application to speaker recognition , 2005, INTERSPEECH.

[9]  John H. L. Hansen,et al.  Discriminative In-Set/Out-of-Set Speaker Recognition , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[10]  Rong Zheng,et al.  Text-independent speaker identification using GMM-UBM and frame level likelihood normalization , 2004, 2004 International Symposium on Chinese Spoken Language Processing.

[11]  Douglas A. Reynolds,et al.  A study of computation speed-UPS of the GMM-UBM speaker recognition system , 1999, EUROSPEECH.

[12]  J.H.L. Hansen,et al.  An efficient scoring algorithm for Gaussian mixture model based speaker identification , 1998, IEEE Signal Processing Letters.

[13]  M. Inés Torres,et al.  A Speaker Clustering Algorithm for Fast Speaker Adaptation in Continuous Speech Recognition , 2004, TSD.

[14]  John R. Hershey,et al.  Approximating the Kullback Leibler Divergence Between Gaussian Mixture Models , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[15]  D.A. Reynolds,et al.  Large population speaker identification using clean and telephone speech , 1995, IEEE Signal Processing Letters.

[16]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[17]  Daben Liu,et al.  Speech and language technologies for audio indexing and retrieval , 2000, Proceedings of the IEEE.

[18]  Hagai Aronowitz,et al.  Speaker indexing in audio archives using test utterance Gaussian mixture modeling , 2004, INTERSPEECH.

[19]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[20]  Vijendra Raj Apsingekar,et al.  Efficient speaker identification using speaker model clustering , 2008, 2008 16th European Signal Processing Conference.