Speaker Recognition of Noisy Short Utterance Based on Speech Frame Quality Discrimination and Three-stage Classification Model

The noisy short utterance is polluted by noise and corpus is less, so the recognition rate significantly decreased. For improving recognition rate, we proposed the dual information quality discrimination algorithm to classify the speech frames: one is differences detection and discrimination algorithm (DDADA), another is the improved SNR discrimination algorithm (ISNRDA). Based on the above two algorithms, the speech frames are classified to three classes: high quality, medium quality and low quality. We proposed GMM-UBM three-stage classification model, and we combine the dual information quality discrimination algorithm with GMM-UBM three-stage classification model. Experiments show that, the dual discrimination quality algorithms can be more precise to classify speech frame, and combining it with GMM-UBM three-stage classification model can make full use of limited corpus of short utterance and can improve the speaker recognition rate of the noisy short utterance.

[1]  Thomas Fang Zheng,et al.  Short Utterance Speaker Recognition , 2012 .

[2]  Ch. V. Rama Rao,et al.  NOISE REDUCTION USING mel-SCALE SPECTRAL SUBTRACTION WITH PERCEPTUALLY DEFINED SUBTRACTION PARAMETERS-A NEW SCHEME , 2011 .

[3]  Sridha Sridharan,et al.  Improving short utterance based i-vector speaker recognition using source and utterance-duration normalization techniques , 2013, INTERSPEECH.

[4]  Anil Garg,et al.  DEVELOPMENTS IN SPECTRAL SUBTRACTION FOR SPEECH ENHANCEMENT , 2012 .

[5]  Tang Zhenmin Research on the Speaker Identification Based on Short Utterance , 2011 .

[6]  Hongbo Lin,et al.  A Sparse NMF-SU for Seismic Random Noise Attenuation , 2013, IEEE Geoscience and Remote Sensing Letters.

[7]  Steven van de Par,et al.  Noise-Robust Speaker Recognition Combining Missing Data Techniques and Universal Background Modeling , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Xuelong Li,et al.  Constrained Nonnegative Matrix Factorization for Image Representation , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Björn W. Schuller,et al.  Real-Time Speech Separation by Semi-supervised Nonnegative Matrix Factorization , 2012, LVA/ICA.

[10]  Björn W. Schuller,et al.  Exploring Nonnegative Matrix Factorization for Audio Classification: Application to Speaker Recognition , 2012, ITG Conference on Speech Communication.

[11]  Sridha Sridharan,et al.  i-vector Based Speaker Recognition on Short Utterances , 2011, INTERSPEECH.

[12]  N. Fatima,et al.  Short Utterance Speaker Recognition A research Agenda , 2012, 2012 International Conference on Systems and Informatics (ICSAI2012).

[13]  Patrick Kenny,et al.  Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[14]  Phil D. Green,et al.  Robust automatic speech recognition with missing and unreliable acoustic data , 2001, Speech Commun..