Speech Modelingwith Magnitude-Normalized Complex Spectra and Its Application to Multisensory Speech Enhancement

A good speech model is essential for speech enhancement, but it is very difficult to build because of huge intra- and extra-speaker variation. We present a new speech model for speech enhancement, which is based on statistical models of magnitude-normalized complex spectra of speech signals. Most popular speech enhancement techniques work in the spectrum space, but the large variation of speech strength, even from the same speaker, makes accurate speech modeling very difficult because the magnitude is correlated across all frequency bins. By performing magnitude normalization for each speech frame, we are able to get rid of the magnitude variation and to build a much better speech model with only a small number of Gaussian components. This new speech model is applied to speech enhancement for our previously developed microphone headsets that combine a conventional air microphone with a bone sensor. Much improved results have been obtained

[1]  Jasha Droppo,et al.  A noise-robust ASR front-end using Wiener filter constructed from MMSE estimation of clean speech and noise , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[2]  Zicheng Liu,et al.  Multi-sensory microphones for robust speech detection, enhancement and recognition , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  W. Bastiaan Kleijn,et al.  On noise gain estimation for HMM-based speech enhancement , 2005, INTERSPEECH.

[4]  Xuedong Huang,et al.  Air- and bone-conductive integrated microphones for robust speech detection and enhancement , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[5]  John H. L. Hansen,et al.  Discrete-Time Processing of Speech Signals , 1993 .

[6]  Y. Ephraim,et al.  A Brief Survey of Speech Enhancement , 2003 .

[7]  Zicheng Liu,et al.  Leakage model and teeth clack removal for air- and bone-conductive integrated microphones , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[8]  Zicheng Liu,et al.  A graphical model for multi-sensory speech processing in air-and-bone conductive microphones , 2005, INTERSPEECH.

[9]  A.V. Oppenheim,et al.  Enhancement and bandwidth compression of noisy speech , 1979, Proceedings of the IEEE.

[10]  Harris Drucker Speech processing in a high ambient noise environment , 1967 .

[11]  Zicheng Liu,et al.  Direct filtering for air- and bone-conductive microphones , 2004, IEEE 6th Workshop on Multimedia Signal Processing, 2004..

[12]  Thomas Quatieri,et al.  Discrete-Time Speech Signal Processing: Principles and Practice , 2001 .

[13]  Y. Ephraim,et al.  A Brief Survey of Speech Enhancement 1 , 2018, Microelectronics.

[14]  R. McAulay,et al.  Speech enhancement using a soft-decision noise suppression filter , 1980 .

[15]  Yariv Ephraim,et al.  A Bayesian estimation approach for speech enhancement using hidden Markov models , 1992, IEEE Trans. Signal Process..