Maximum Entropy Based Data Selection for Speaker Recognition

This paper presents the data selection method for speaker recognition. Since there is no promise that more data guarantee better results, the way of data selection becomes important. In the GMM-UBM speaker recognition, the UBM is trained to represent the speaker-independent distribution of acoustic features while the GMM speaker model is tailored for a specific speaker. In this study of data selection for speaker recognition, we apply the maximum entropy criterion to remove the redundant feature frames in the UBM training and to select the discriminative feature frames in the GMM speaker modeling. The conducted experiments on the 2008 NIST Speaker Recognition Evaluation corpus show that the proposed method outperforms the baseline system without the data selection.

[1]  Hong-Goo Kang,et al.  Selecting Feature Frames for Automatic Speaker Recognition Using Mutual Information , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Douglas A. Reynolds,et al.  A Tutorial on Text-Independent Speaker Verification , 2004, EURASIP J. Adv. Signal Process..

[3]  Bin Ma,et al.  Speaker characterization using long-term and temporal information , 2010, INTERSPEECH.

[4]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[5]  Haizhou Li,et al.  UBM data selection for effective speaker modeling , 2010, 2010 7th International Symposium on Chinese Spoken Language Processing.

[6]  J. E. Porter,et al.  Normalizations and selection of speech segments for speaker recognition scoring , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[7]  Patrick Kenny,et al.  Joint Factor Analysis Versus Eigenchannels in Speaker Recognition , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[9]  Yun Lei,et al.  A novel feature sub-sampling method for efficient universal background model training in speaker verification , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[10]  Alex Acero,et al.  Spoken Language Processing: A Guide to Theory, Algorithm and System Development , 2001 .

[11]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.