Speech Enhancement Based on NMF Under Electric Vehicle Noise Condition

Speech-based human–machine interaction (HMI) is essential to electronic navigation, autonomous cars, and intelligent vehicles. The noises generated by the mechanical motion or electric power equipment degrade speech quality and result in HMI failing to work effectively. However, there is relatively little literature available on speech enhancement under electric vehicle noise condition. This paper presents a speech enhancement method based on improved nonnegative matrix factorization (ImNMF). Unlike the traditional nonnegative matrix factorization (NMF) trains its speech dictionary using speech recorded in advance which inevitably contains a little noise component, ImNMF generates the speech dictionary using the spectra of pitch and their harmonics via mathematical model. This purpose is to guarantee the purity of speech dictionary. In addition, in order to alleviate the loss of the information of the noise sample, ImNMF constructs noise dictionary by a combination of the gain adjusted spectrum frames of the noise samples separated online. Compared with traditional NMF, the ImNMF noise atoms are relatively larger. Thus, the representation of speech signal mixed with noise atoms is greatly reduced. Therefore, ImNMF can reduce distortion of reconstructed speech while enhancing the recovered speech quality. Speech enhancement and speaker verification experiments on NUST603 and TIMIT data showed that the proposed ImNMF can effectively enhance speech signal in the noise environment of electric vehicles and further can reduce the equal error rate of the speaker verification system.

[1]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[2]  Tomohiro Nakatani,et al.  Noise Model Transfer: Novel Approach to Robustness Against Nonstationary Noise , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  Andries P. Hekstra,et al.  Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[4]  Themos Stafylakis,et al.  Speaker and Channel Factors in Text-Dependent Speaker Recognition , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[5]  Sridha Sridharan,et al.  I-vector based speaker recognition using advanced channel compensation techniques , 2014, Comput. Speech Lang..

[6]  Guizhong Liu,et al.  Separation of Singing Voice Using Nonnegative Matrix Partial Co-Factorization for Singer Identification , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[7]  S. Boll,et al.  Suppression of acoustic noise in speech using spectral subtraction , 1979 .

[8]  Jesper Jensen,et al.  An Algorithm for Intelligibility Prediction of Time–Frequency Weighted Noisy Speech , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  Susanto Rahardja,et al.  /spl beta/-order MMSE spectral amplitude estimation for speech enhancement , 2005, IEEE Transactions on Speech and Audio Processing.

[10]  Daryl G. Beetner,et al.  Mitigation Emission Strategy Based on Resonances from a Power Inverter System in Electric Vehicles , 2016 .

[11]  Mikhail Kotov,et al.  Exploiting Non-negative Matrix Factorization with Linear Constraints in Noise-Robust Speaker Identification , 2014, SPECOM.

[12]  Tang Zhenmin Research on the Speaker Identification Based on Short Utterance , 2011 .

[13]  Mikhail Kotov,et al.  Non-negative matrix factorization with linear constraints for single-channel speech enhancement , 2013, INTERSPEECH.

[14]  Douglas D. O'Shaughnessy,et al.  Improving the performance of far-field speaker verification using multi-condition training: the case of GMM-UBM and i-vector systems , 2014, INTERSPEECH.

[15]  Rémi Gribonval,et al.  Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[16]  Syed Abdul Rahman Al-Haddad,et al.  Low-Distortion MMSE Speech Enhancement Estimator Based on Laplacian Prior , 2017, IEEE Access.

[17]  Yi Hu,et al.  Evaluation of objective measures for speech enhancement , 2006, INTERSPEECH.

[18]  Xu Yun Gaussian PLDA for Speaker Verification and Joint Estimation , 2014 .

[19]  A.V. Oppenheim,et al.  Enhancement and bandwidth compression of noisy speech , 1979, Proceedings of the IEEE.

[20]  Meng Sun,et al.  Speech Enhancement Under Low SNR Conditions Via Noise Estimation Using Sparse and Low-Rank NMF with Kullback–Leibler Divergence , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[21]  Thomas Fang Zheng,et al.  Unseen Noise Estimation Using Separable Deep Auto Encoder for Speech Enhancement , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[22]  Yi Liu,et al.  An In-car Chinese Noise Corpus for Speech Recognition , 2011, 2011 International Conference on Asian Language Processing.

[23]  Paris Smaragdis,et al.  Supervised and Unsupervised Speech Enhancement Using Nonnegative Matrix Factorization , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[24]  Lian Qiu,et al.  Research Advances on Dictionary Learning Models, Algorithms and Applications , 2015 .

[25]  Yifan Gong,et al.  An Overview of Noise-Robust Automatic Speech Recognition , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[26]  Bing Xia,et al.  Improved Battery Parameter Estimation Method Considering Operating Scenarios for HEV/EV Applications , 2016 .