Codebook-Based Bayesian Speech Enhancement for Nonstationary Environments

In this paper, we propose a Bayesian minimum mean squared error approach for the joint estimation of the short-term predictor parameters of speech and noise, from the noisy observation. We use trained codebooks of speech and noise linear predictive coefficients to model the a priori information required by the Bayesian scheme. In contrast to current Bayesian estimation approaches that consider the excitation variances as part of the a priori information, in the proposed method they are computed online for each short-time segment, based on the observation at hand. Consequently, the method performs well in nonstationary noise conditions. The resulting estimates of the speech and noise spectra can be used in a Wiener filter or any state-of-the-art speech enhancement system. We develop both memoryless (using information from the current frame alone) and memory-based (using information from the current and previous frames) estimators. Estimation of functions of the short-term predictor parameters is also addressed, in particular one that leads to the minimum mean squared error estimate of the clean speech signal. Experiments indicate that the scheme proposed in this paper performs significantly better than competing methods

[1]  W. Bastiaan Kleijn,et al.  Codebook driven short-term predictor parameter estimation for speech enhancement , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Yunxin Zhao,et al.  Recursive estimation of time-varying environments for robust speech recognition , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[3]  Athanasios Papoulis,et al.  Probability, Random Variables and Stochastic Processes , 1965 .

[4]  K. Paliwal,et al.  Quantization of LPC Parameters , 2022 .

[5]  Robert M. Gray,et al.  An Algorithm for Vector Quantizer Design , 1980, IEEE Trans. Commun..

[6]  R. Gray,et al.  Distortion measures for speech processing , 1980 .

[7]  Peter No,et al.  Digital Coding of Waveforms , 1986 .

[8]  Hamid Sheikhzadeh,et al.  HMM-based strategies for enhancement of speech signals embedded in nonstationary noise , 1998, IEEE Trans. Speech Audio Process..

[9]  Ephraim Speech enhancement using a minimum mean square error short-time spectral amplitude estimator , 1984 .

[10]  Yariv Ephraim,et al.  A Bayesian estimation approach for speech enhancement using hidden Markov models , 1992, IEEE Trans. Signal Process..

[11]  W. Bastiaan Kleijn,et al.  Estimation of the excitation variances of speech and noise AR-models for enhanced speech coding , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[12]  Masahide Sugiyama Model based voice decomposition method , 2000, INTERSPEECH.

[13]  S. Boll,et al.  Suppression of acoustic noise in speech using spectral subtraction , 1979 .

[14]  S. Kay Fundamentals of statistical signal processing: estimation theory , 1993 .

[15]  W. Bastiaan Kleijn,et al.  Codebook-based Bayesian speech enhancement , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[16]  Yariv Ephraim,et al.  A signal subspace approach for speech enhancement , 1995, IEEE Trans. Speech Audio Process..

[17]  Jerry D. Gibson,et al.  Filtering of colored noise for speech enhancement and coding , 1991, IEEE Trans. Signal Process..

[18]  G. Longo Source Coding Theory , 1970 .

[19]  David Malah,et al.  Tracking speech-presence uncertainty to improve speech enhancement in non-stationary noise environments , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[20]  Alexander Fischer,et al.  Quantile based noise estimation for spectral subtraction and Wiener filtering , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[21]  Rainer Martin,et al.  Noise power spectral density estimation based on optimal smoothing and minimum statistics , 2001, IEEE Trans. Speech Audio Process..

[22]  Wonyong Sung,et al.  A statistical model-based voice activity detection , 1999, IEEE Signal Processing Letters.

[23]  Yariv Ephraim,et al.  Statistical-model-based speech enhancement systems , 1992, Proc. IEEE.

[24]  Javier Ramírez,et al.  Efficient voice activity detection algorithms using long-term speech information , 2004, Speech Commun..

[25]  W. Bastiaan Kleijn,et al.  Minimum mean square error estimation of speech short-term predictor parameters under noisy conditions , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[26]  Yariv Ephraim,et al.  A minimum mean square error approach for speech enhancement , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[27]  K. H. Barratt Digital Coding of Waveforms , 1985 .