SPEECH ENHANCEMENT BASED ON LABEL CONSISTENT K-SVD UNDER NOISY ENVIRONMENT

The sparse algorithm for sparse enhancement is more and more popular issues, recently. In previous research, the sparse algorithm for sparse enhancement will spend much time, so we propose LC K-SVD(Label Consistent K-SVD) to reduce spending time. We focus on the White Gaussian Noise. The experiments show that denoising performance of our proposed method is very closed to sparse algorithm in SNR, LLR, SNRseg and PESQ, even better then it. Our method only need half time then sparse algorithm. Introduction Speech is the most important tool for the people who communicate with each other. If we need to use machine for communicate, we need speech processing to help us. For example speech recognition system. When the speech is in noisy environment, noise will make speech recognition rate decreased. So speech enhancement processing is necessary. Speech enhancement research methods such as Kalman filter [1] ,spectral subtraction [2], Wiener filter [2] have been proposed. These methods have some effect on speech enhancement. Recently, more and more researchers concerned about the sparse representations issue. The primary thing is dictionary learning. Michal Aharon, Michael Elad and Alfred Bruckstein proposed the K-SVD method [3]. In the dictionary updating steps, we can update the dictionary and its coefficients together. Ching-Tang Hsieh and Yan-heng Chen apply sparse theory to speech enhancement.[4] The experimental results show that their proposed method is superior than methods above mentioned, but spending time is lengthy. We purpose LC K-SVD(Label Consistent K-SVD)[5] [6] to reduce spending time. In Section II, we will introduce the LC K-SVD and how it works. Section III is experimental result, we compare our proposed method with Hsieh’s sparse theory. We discuss conclusion and future research works in Section IV. LC K-SVD First, we use SBAV (Sub-Band Amplitude Variance) algorithm[7] to classify two label, unvoiced and voiced of speech. Then, we slide a window to divide the sequence of noisy speech signal into N frames, the window length is K, shift K/2 per slide, and then stored in an matrix Y. Second, we put label and matrix Y into LC K-SVD, then we obtain the updated dictionary D and the sparse coefficients representations of matrix X, that belonging to dictionary D. The training process is given in Fig.1. Fig.1 The training process of updated dictionary D and coefficientsX. 6th International Conference on Machinery, Materials, Environment, Biotechnology and Computer (MMEBC 2016) © 2016. The authors Published by Atlantis Press 524 Finally, we multiply two matrixes D and X to reconstruct the clean speech signal. The reconstruction process of clean speech signal is given in Fig.2. Fig.2 The reconstruction process of clean speech signal. Each input signal will obtain their learning dictionary and a set of coefficients. Then we use the trained dictionary D and reconstructed coefficients X to estimate the clean speech signal. Test results We use four kind of objective quality measures [8]-[9] to evaluate the effect of denoising signal, such as the SNR, Log-Likelihood Ratio (LLR), segmental SNR (SNRseg) and Perceptual Evaluation of Speech Quality (PESQ). The clean speechs are taken from CHIME data [10] which includes 600 speechs by 34 speakers reading 6 sequences of the command-color-preposition-letter-number-adverb. All data have a 16kHz sampling rate. Input signal will be limited to the amplitude range between -1 to 1. The speech signal will pass a high-pass filter to eliminate the effect of lips and vocal cords during phonation. We add with white gaussian noise at SNR levels of -10, -5, 0, 5 and 10 dB into 600 speechs. Then all speechs use LC K-SVD and sparse KSVD to denoise. The average results of four objective quality measures show in Fig.3-6. Spending time is tabulated in Table.1. Fig.3 The average esults under LLR quality measures.

[1]  Yi Hu,et al.  Evaluation of Objective Quality Measures for Speech Enhancement , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Xueying Zhang,et al.  A Speech Endpoint Detection Method Based on Wavelet Coefficient Variance and Sub-Band Amplitude Variance , 2006, First International Conference on Innovative Computing, Information and Control - Volume I (ICICIC'06).

[3]  M. Elad,et al.  $rm K$-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation , 2006, IEEE Transactions on Signal Processing.

[4]  S. El-Rabaie,et al.  Speech enhancement with an adaptive Wiener filter , 2013, International Journal of Speech Technology.

[5]  Toshihiro Furukawa,et al.  Kalman filter for robust noise suppression in white and colored noises , 2008, 2008 IEEE International Symposium on Circuits and Systems.

[6]  Yi Hu,et al.  Objective measures for predicting speech intelligibility in noisy conditions based on new band-importance functions. , 2009, The Journal of the Acoustical Society of America.

[7]  Larry S. Davis,et al.  Learning a discriminative dictionary for sparse coding via label consistent K-SVD , 2011, CVPR 2011.

[8]  Ching-Tang Hsieh,et al.  SPEECH ENHANCEMENT BASED ON SPARSE THEORY UNDER NOISY ENVIRONMENT , 2015 .

[9]  Ning Ma,et al.  The PASCAL CHiME speech separation and recognition challenge , 2013, Comput. Speech Lang..

[10]  A. Bruckstein,et al.  K-SVD : An Algorithm for Designing of Overcomplete Dictionaries for Sparse Representation , 2005 .

[11]  Larry S. Davis,et al.  Label Consistent K-SVD: Learning a Discriminative Dictionary for Recognition , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.