Method of Estimating Signal-to-Noise Ratio Based on Optimal Design for Sub-band Voice Activity Detection

The global signal to noise ratio (gSNR) is the ratio of concurrent powers between speech and noise in a noisy speech signal. Its estimates play an important role in power envelope restoration and predictions of speech intelligibility based on the speech transmission index (STI). Here, we propose a gSNR estimation framework that mainly consists of sub-band processing, voice activity detection (VAD), and threshold optimization. This process made the detection of speech and noise much more accurate than that with the global full-band process. In addition, an optimal threshold was designed to detect speech and noise under all testing conditions (e.g., different SNRs) rather than using a fixed decision threshold in VAD under all testing conditions, which has been done in most studies. This optimal threshold was obtained based on minimizing the root mean square (RMS) of the false acceptance rate (FAR) and false rejection rate (FRR) on the receiver operating characteristic (ROC) curves in each sub-band. Global SNR was calculated by summarizing the powers of speech and noise in all sub-bands with the help of the sub-band process and optimal design for VAD decision. Comprehensive evaluations were carried out using various types of noise and gSNR conditions. Classical VAD methods based on G.729B and thresholding using Otsu’s method were used in comparative gSNR estimate. The results revealed that the proposed scheme could obtain higher accuracy in estimates of gSNR than the comparative methods.

[1]  Petr Pollák,et al.  Methods for Speech SNR Estimation: Evaluation Tool and Analysis of VAD Dependency , 2005 .

[2]  Joon-Hyuk Chang,et al.  A new a priori SNR estimator based on multiple linear regression technique for speech enhancement , 2014, Digit. Signal Process..

[3]  Kenji Nakayama,et al.  A noise estimation method based on improved VAD used in noise spectral suppression under highly non-stationary noise environments , 2009, 2009 17th European Signal Processing Conference.

[4]  H. Rahmani,et al.  Improving voice activity detection used in ITU-T G.729.B , 2009 .

[5]  DeLiang Wang,et al.  A CASA-Based System for Long-Term SNR Estimation , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  Rafik A. Goubran,et al.  SNR estimation of speech signals using subbands and fourth-order statistics , 1999, IEEE Signal Processing Letters.

[7]  E. Shlomot,et al.  ITU-T Recommendation G.729 Annex B: a silence compression scheme for use with G.729 optimized for V.70 digital simultaneous voice and data applications , 1997, IEEE Commun. Mag..

[8]  Herman J. M. Steeneken,et al.  Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems , 1993, Speech Commun..

[9]  Volker Hohmann,et al.  Sub-band SNR estimation using auditory feature processing , 2003, Speech Commun..

[10]  Tim Fingscheidt,et al.  A Data-Driven Approach to A Priori SNR Estimation , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  Alan V. Oppenheim,et al.  All-pole modeling of degraded speech , 1978 .

[12]  N. Otsu A threshold selection method from gray level histograms , 1979 .

[13]  Ephraim Speech enhancement using a minimum mean square error short-time spectral amplitude estimator , 1984 .

[14]  Masashi Unoki,et al.  Study on Blind Method of Estimating Speech Transmission Index from Noisy Reverberant Amplitude-Modulated-Signals , 2014 .

[15]  Rainer Martin,et al.  An efficient algorithm to estimate the instantaneous SNR of speech signals , 1993, EUROSPEECH.

[16]  Shingo Kuroiwa,et al.  CENSREC-1-C: An evaluation framework for voice activity detection under noisy environments , 2009 .

[17]  Rüdiger Hoffmann,et al.  MTF-based Sub-band Power-envelope Restoration for Robust Speech Recognitionin NoisyReverberant Environments , 2011 .

[18]  Richard M. Stern,et al.  Robust signal-to-noise ratio estimation based on waveform amplitude distribution analysis , 2008, INTERSPEECH.

[19]  Xuemin Shen,et al.  A dynamic system approach to speech enhancement using the H∞ filtering algorithm , 1999, IEEE Trans. Speech Audio Process..