论文信息 - Speaker Indexing in Audio Archives Using Gaussian Mixture Scoring Simulation

Speaker Indexing in Audio Archives Using Gaussian Mixture Scoring Simulation

Speaker indexing has recently emerged as an important task due to the rapidly growing volume of audio archives. Current filtration techniques still suffer from problems both in accuracy and efficiency. In this paper an efficient method to simulate GMM scoring is presented. Simulation is done by fitting a GMM not only to every target speaker but also to every test utterance, and then computing the likelihood of the test call using these GMMs instead of using the original data. GMM simulation is used to achieve very efficient speaker indexing in terms of both search time and index size. Results on the SPIDRE and NIST-2004 speaker evaluation corpuses show that our approach maintains and sometimes exceeds the accuracy of the conventional GMM algorithm and achieves efficient indexing capabilities: 6000 times faster than a conventional GMM with 1% overhead in storage.

Hagai Aronowitz | Amihood Amir | David Burshtein

[1] Douglas E. Sturim,et al. Speaker indexing in large audio databases using anchor models , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[2] Herbert Gish,et al. Covariance estimation methods for channel robust text-independent speaker identification , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[3] Douglas A. Reynolds,et al. A study of computation speed-UPS of the GMM-UBM speaker recognition system , 1999, EUROSPEECH.

[4] Ivan Magrin-Chagnolleau,et al. Audio Indexing: What Has Been Accomplished and the Road Ahead , 2002, JCIS.

[5] Jonathan Foote,et al. An overview of audio information retrieval , 1999, Multimedia Systems.

[6] Douglas A. Reynolds,et al. Comparison of background normalization methods for text-independent speaker verification , 1997, EUROSPEECH.

[7] Alvin F. Martin,et al. The NIST speaker recognition evaluation program , 2005 .

[8] Wei-Ho Tsai,et al. Explicit exploitation of stochastic characteristics of test utterance for text-independent speaker identification , 2001, INTERSPEECH.

[9] Hagai Aronowitz,et al. Speaker indexing in audio archives using test utterance Gaussian mixture modeling , 2004, INTERSPEECH.