Online Monaural Speech Enhancement Based on Periodicity Analysis and A Priori SNR Estimation

This paper describes an online algorithm for enhancing monaural noisy speech. First, a novel phase-corrected low-delay gammatone filterbank is derived for signal subband decomposition and resynthesis; the subband signals are then analyzed frame by frame. Second, a novel feature named periodicity degree (PD) is proposed to be used for detecting and estimating the fundamental period ( P0) in each frame and for estimating the signal-to-noise ratio (SNR) in each frame-subband signal unit. The PD is calculated in each unit as the multiplication of the normalized autocorrelation and the comb filter ratio, and shown to be robust in various low-SNR conditions. Third, the noise energy level in each signal unit is estimated recursively based on the estimated SNR for units with high PD and based on the noisy signal energy level for units with low PD. Then the a priori SNR is estimated using a decision-directed approach with the estimated noise level. Finally, a revised Wiener gain is calculated, smoothed, and applied to each unit; the processed units are summed across subbands and frames to form the enhanced signal. The P 0 detection accuracy of the algorithm was evaluated on two corpora and showed comparable performance on one corpus and better performance on the other corpus when compared to a recently published pitch detection algorithm. The speech enhancement effect of the algorithm was evaluated on one corpus with two objective criteria and showed better performance in one highly non-stationary noise and comparable performance in two other noises when compared to a state-of-the-art statistical-model based algorithm.

[1]  R. Tucker,et al.  Voice activity detection using a periodicity measure , 1992 .

[2]  Paul Boersma,et al.  Praat, a system for doing phonetics by computer , 2002 .

[3]  Jesper Jensen,et al.  Spectral Magnitude Minimum Mean-Square Error Estimation Using Binary and Continuous Gain Functions , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  P. Boersma ACCURATE SHORT-TERM ANALYSIS OF THE FUNDAMENTAL FREQUENCY AND THE HARMONICS-TO-NOISE RATIO OF A SAMPLED SOUND , 1993 .

[5]  Richard M. Dansereau,et al.  Single-Channel Speech Separation Using Soft Mask Filtering , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  Ephraim Speech enhancement using a minimum mean square error short-time spectral amplitude estimator , 1984 .

[7]  Herman J. M. Steeneken,et al.  Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems , 1993, Speech Commun..

[8]  DeLiang Wang,et al.  Monaural speech segregation based on pitch tracking and amplitude modulation , 2002, IEEE Transactions on Neural Networks.

[9]  Richard C. Hendriks,et al.  Unbiased MMSE-Based Noise Power Estimation With Low Complexity and Low Tracking Delay , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[10]  Hajime Kobayashi,et al.  Weighted autocorrelation for pitch extraction of noisy speech , 2001, IEEE Trans. Speech Audio Process..

[11]  Brian R Glasberg,et al.  Derivation of auditory filter shapes from notched-noise data , 1990, Hearing Research.

[12]  Jesper Jensen,et al.  DFT-Domain Based Single-Microphone Noise Reduction for Speech Enhancement , 2013, DFT-Domain Based Single-Microphone Noise Reduction for Speech Enhancement.

[13]  Volker Hohmann,et al.  SNR Estimation and Enhancement of Voiced Speech Based on Periodicity Analysis , 2014, ITG Symposium on Speech Communication.

[14]  DeLiang Wang,et al.  An Auditory Scene Analysis Approach to Monaural Speech Segregation , 2006 .

[15]  Yi Hu,et al.  Evaluation of Objective Quality Measures for Speech Enhancement , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[16]  Matthew McCallum,et al.  Stochastic-Deterministic MMSE STFT Speech Enhancement With General A Priori Information , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[17]  Andreas Jakobsson,et al.  Optimal Filter Designs for Separating and Enhancing Periodic Signals , 2010, IEEE Transactions on Signal Processing.

[18]  Rainer Martin,et al.  A novel a priori SNR estimation approach based on selective cepstro-temporal smoothing , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[19]  M. Ross,et al.  Average magnitude difference function pitch extractor , 1974 .

[20]  Fabrice Plante,et al.  A pitch extraction reference database , 1995, EUROSPEECH.

[21]  Pascal Scalart,et al.  Speech enhancement based on a priori signal to noise estimation , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[22]  Jacob Benesty,et al.  Enhancement of Single-Channel Periodic Signals in the Time-Domain , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[23]  P. Boersma Praat : doing phonetics by computer (version 5.1.05) , 2009 .

[24]  DeLiang Wang,et al.  Unvoiced Speech Segregation From Nonspeech Interference via CASA and Spectral Subtraction , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[25]  Jun Du,et al.  An Experimental Study on Speech Enhancement Based on Deep Neural Networks , 2014, IEEE Signal Processing Letters.

[26]  Mitchel Weintraub,et al.  A theory and computational model of auditory monaural sound separation , 1985 .

[27]  Abeer Alwan,et al.  Multi-band summary correlogram-based pitch detection for noisy speech , 2013, Speech Commun..

[28]  Li-Rong Dai,et al.  A Regression Approach to Speech Enhancement Based on Deep Neural Networks , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[29]  Philipos C. Loizou,et al.  Speech Enhancement: Theory and Practice , 2007 .

[30]  DeLiang Wang,et al.  Segregation of unvoiced speech from nonspeech interference. , 2008, The Journal of the Acoustical Society of America.

[31]  Yi Hu,et al.  Subjective comparison and evaluation of speech enhancement algorithms , 2007, Speech Commun..