A Hybrid Approach for Single Channel Speech Enhancement using Deep Neural Network and Harmonic Regeneration Noise Reduction

This paper presents a hybrid approach for single channel speech enhancement using deep neural network (DNN) and harmonic regeneration noise reduction (HRNR). The DNN was used as a supervised algorithm to predict new target mask such as constrained Wiener Filter (cWF) target mask from noisy mixture signal that was transformed into gammatone filter bank features. Meanwhile, HRNR algorithm was applied in the post-filtering strategy to eliminate residual noise. The DNN algorithm is an emerging supervised speech enhancement to overcome heavy nonstationary noise and low signal-to-noise ratio (SNR) issues. To validate the proposed algorithm with new target mask, 600 Malay utterances combining male and female speakers were used in a training session while 120 Malay utterances were used in a prediction session. The short time objective intelligibility (STOI) and perceptual evaluation of speech quality (PESQ) scores were calculated as the performance metrics. In this work, the proposed target mask outperformed other baseline target masks. Thus, PESQ and STOI scores for the hybrid speech enhancement algorithm is 1.17 and 0.79, respectively, at - 5 dB babble noise SNR.