Comparison of Algorithms for Speaker Identification under Adverse Far-Field Recording Conditions with Extremely Short Utterances

In this paper, we compare the state-of-the-art algorithms for text-independent speaker identification under adverse far-field recording conditions with extremely short training and testing utterances. The algorithms include both the generative and discriminative methods. For the generative methods, three variants of the original Gaussian Mixture Model (GMM) and the Universal Background Model adapted Gaussian Mixture Model (UBM-GMM) are involved. For the discriminative methods, two kernel-based algorithms, namely, the Support Vector Machine (SVM) and the Relevance Vector Machine (RVM), are considered. The comparison mainly focuses on the speaker identification accuracy and the speed of the individual algorithms (for both training and testing) as well as the sparseness of the resulting model for the kernel-based methods. Finally, we demonstrate through experiments that multi-channel fusion of the far-field recordings yields improved performance across all the above algorithms.

[1]  Jr. J.P. Campbell,et al.  Speaker recognition: a tutorial , 1997, Proc. IEEE.

[2]  P. Mermelstein,et al.  Distance measures for speech recognition, psychological and instrumental , 1976 .

[3]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[4]  Nikos A. Vlassis,et al.  A Greedy EM Algorithm for Gaussian Mixture Learning , 2002, Neural Processing Letters.

[5]  Roberto Brunelli,et al.  Person identification using multiple cues , 1995, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Michael E. Tipping,et al.  Fast Marginal Likelihood Maximisation for Sparse Bayesian Models , 2003 .

[7]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[8]  George Eastman House,et al.  Sparse Bayesian Learning and the Relevance Vector Machine , 2001 .

[9]  Steve Renals,et al.  Speaker verification using sequence discriminant support vector machines , 2005, IEEE Transactions on Speech and Audio Processing.

[10]  Hao Tang,et al.  Sparse Bayesian approach to classification , 2005, Proceedings. 2005 IEEE Networking, Sensing and Control, 2005..

[11]  Anil K. Jain,et al.  Unsupervised Learning of Finite Mixture Models , 2002, IEEE Trans. Pattern Anal. Mach. Intell..