Blind Signal-to-Noise Ratio Estimation of Speech Based on Vector Quantizer Classifiers and Decision Level Fusion

A blind approach for estimating the signal to noise ratio (SNR) of a speech signal corrupted by additive noise is proposed. The method is based on a pattern recognition paradigm using various linear predictive based features, a vector quantizer classifier and estimation combination. Blind SNR estimation is very useful in speaker identification systems in which a confidence metric is determined along with the speaker identity. The confidence metric is partially based on the mismatch between the training and testing conditions of the speaker identification system and SNR estimation is very important in evaluating the degree of this mismatch. The aim is to correctly estimate SNR values from 0 to 30 dB, a range that is both practical and crucial for speaker identification systems. Experiments consider (1) artificially generated additive white Gaussian noise, pink noise and bandpass noise and (2) fifteen noise types from the NOISEX database. Four features are combined to get the best results. The average SNR estimation error depends on the type of noise in that a relatively low error results for pink noise and jet cockpit noise and a high error results for destroyer operations room noise and military vehicle noise. For both artificially generated noise and the NOISEX data, the error is lower than what is achieved by the IMCRA method that uses SNR estimation for speech enhancement. Combining the four features with IMCRA lowers the error for 8 of the 15 noise types from NOISEX.

[1]  R Togneri,et al.  An Overview of Speaker Identification: Accuracy and Robustness Issues , 2011, IEEE Circuits and Systems Magazine.

[2]  R. Polikar,et al.  Ensemble based systems in decision making , 2006, IEEE Circuits and Systems Magazine.

[3]  Israel Cohen,et al.  Noise spectrum estimation in adverse environments: improved minima controlled recursive averaging , 2003, IEEE Trans. Speech Audio Process..

[4]  Ravi P. Ramachandran,et al.  Neural network classifiers and Principal Component Analysis for blind signal to noise ratio estimation of speech signals , 2009, 2009 IEEE International Symposium on Circuits and Systems.

[5]  Jr. J.P. Campbell,et al.  Speaker recognition: a tutorial , 1997, Proc. IEEE.

[6]  Philipos C. Loizou,et al.  Speech Enhancement: Theory and Practice , 2007 .

[7]  Dario Petri,et al.  Uncertainty Evaluation of Objective Speech Quality Measurement in VoIP Systems , 2009, IEEE Transactions on Instrumentation and Measurement.

[8]  Philipos C. Loizou,et al.  A noise-estimation algorithm for highly non-stationary environments , 2006, Speech Commun..

[9]  Israel Cohen,et al.  Speech enhancement for non-stationary noise environments , 2001, Signal Process..

[10]  Jiying Zhao,et al.  Speech Quality Evaluation: A New Application of Digital Watermarking , 2005, 2005 IEEE Instrumentationand Measurement Technology Conference Proceedings.

[11]  Mingjiang Wang,et al.  Speech enhancement for nonstationary noise environments , 2017, 2017 IEEE 17th International Conference on Communication Technology (ICCT).

[12]  Peter Kabal,et al.  The computation of line spectral frequencies using Chebyshev polynomials , 1986, IEEE Trans. Acoust. Speech Signal Process..

[13]  Richard J. Mammone,et al.  Speaker identification based on the use of robust cepstral features obtained from pole-zero transfer functions , 1998, IEEE Trans. Speech Audio Process..

[14]  Shing-Tai Pan,et al.  An FPGA-Based Embedded Robust Speech Recognition System Designed by Combining Empirical Mode Decomposition and a Genetic Algorithm , 2012, IEEE Transactions on Instrumentation and Measurement.

[15]  Robert M. Gray,et al.  An Algorithm for Vector Quantizer Design , 1980, IEEE Trans. Commun..

[16]  Haizhou Li,et al.  An overview of text-independent speaker recognition: From features to supervectors , 2010, Speech Commun..

[17]  Birger Kollmeier,et al.  SNR estimation based on amplitude modulation analysis with applications to noise suppression , 2003, IEEE Trans. Speech Audio Process..

[18]  I. Cohen,et al.  Noise estimation by minima controlled recursive averaging for robust speech enhancement , 2002, IEEE Signal Processing Letters.

[19]  Richard J. Mammone,et al.  New LP-derived features for speaker identification , 1994, IEEE Trans. Speech Audio Process..

[20]  Juan Carlos,et al.  Review of "Discrete-Time Speech Signal Processing - Principles and Practice", by Thomas Quatieri, Prentice-Hall, 2001 , 2003 .

[21]  Brett Y. Smolenski,et al.  Blind Determination of the Signal to Noise Ratio of Speech Signals Based on Estimation Combination of Multiple Features , 2006, APCCAS 2006 - 2006 IEEE Asia Pacific Conference on Circuits and Systems.