Research on Noise Processing and Speech Frame Classification Model of Noisy Short Utterance Signal Processing

The noise processing is the key of improving recognition rate for the noisy utterance. While for the short utterance, its corpus is less and small amount of speech data is available for testing and training, so making full use of its limit corpus is the key of improving recognition rate of the short utterance. For the noisy short utterance, the noise processing and making full use of the limit corpus are vital. We proposed noise separation algorithm based on constrained Non-negative matrix factorization (CNMF) to make the noise processing. As making full use of the limit corpus, we proposed the improved SNR discrimination algorithm (ISNRDA) and the differences detection and discrimination algorithm (DDADA), we use the two classification algorithm to estimate the quality of the speech frame, and classify the speech frame. Besides, we combine the above classification result with the GMM-UBM three-stage classification model proposed in this paper, so that we can make full use of the limit corpus of the noisy short utterance. Experiments show that the above algorithms can improve speaker recognition performance of noisy short utterance.

[1]  Sridha Sridharan,et al.  i-vector Based Speaker Recognition on Short Utterances , 2011, INTERSPEECH.

[2]  Yu Tsao,et al.  A MAP-based Online Estimation Approach to Ensemble Speaker and Speaking Environment Modeling , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[3]  John H. L. Hansen,et al.  Maximum Likelihood Acoustic Factor Analysis Models for Robust Speaker Verification in Noise , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[4]  Moni Naor,et al.  Latent Variable Analysis and Signal Separation , 2015, Lecture Notes in Computer Science.

[5]  Björn W. Schuller,et al.  Real-Time Speech Separation by Semi-supervised Nonnegative Matrix Factorization , 2012, LVA/ICA.

[6]  Anil Garg,et al.  DEVELOPMENTS IN SPECTRAL SUBTRACTION FOR SPEECH ENHANCEMENT , 2012 .

[7]  John H. L. Hansen,et al.  Robust front-end processing for speaker identification over extremely degraded communication channels , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[8]  Ch. V. Rama Rao,et al.  NOISE REDUCTION USING mel-SCALE SPECTRAL SUBTRACTION WITH PERCEPTUALLY DEFINED SUBTRACTION PARAMETERS-A NEW SCHEME , 2011 .

[9]  Steven van de Par,et al.  Noise-Robust Speaker Recognition Combining Missing Data Techniques and Universal Background Modeling , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[10]  Björn W. Schuller,et al.  Exploring Nonnegative Matrix Factorization for Audio Classification: Application to Speaker Recognition , 2012, ITG Conference on Speech Communication.

[11]  Azra Shamim,et al.  Speaker recognition system using mel-frequency cepstrum coefficients, linear prediction coding and vector quantization , 2013, 2013 3rd IEEE International Conference on Computer, Control and Communication (IC4).

[12]  Guy J. Brown,et al.  Mask estimation for missing data speech recognition based on statistics of binaural interaction , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[13]  Thomas Fang Zheng,et al.  Short Utterance Speaker Recognition , 2012 .

[14]  John H. L. Hansen,et al.  Analysis and Compensation of Lombard Speech Across Noise Type and Levels With Application to In-Set/Out-of-Set Speaker Recognition , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[15]  Patrick Kenny,et al.  Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.