Body Conducted Speech Enhancement by Equalization and Signal Fusion

This paper studies body-conducted speech for noise robust speech processing purposes. As body-conducted speech is typically limited in bandwidth, signal processing is required to obtain a signal that is both high in quality and low in noise. We propose an algorithm that first equalizes the body-conducted speech using filters obtained from a pre-defined filter set and subsequently fuses this equalized signal with a noisy conventional microphone signal using an optimal clean speech amplitude and phase estimator. We evaluated the proposed equalization and fusion technique using a combination of a conventional close-talk and a throat microphone. Subjective listening tests show that the proposed method successfully fuses the speech quality of the conventional signal and the noise robustness of the throat microphone signal. The listening tests also indicate that the inclusion of the body-conducted signal can improve single-channel speech enhancement methods, while a calculated set of objective signal quality measures confirm these observations.

[1]  Zicheng Liu,et al.  A graphical model for multi-sensory speech processing in air-and-bone conductive microphones , 2005, INTERSPEECH.

[2]  Zicheng Liu,et al.  Speech Modelingwith Magnitude-Normalized Complex Spectra and Its Application to Multisensory Speech Enhancement , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[3]  Thomas F. Quatieri,et al.  Multisensor Dynamic Waveform Fusion , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[4]  Thang tat Vu,et al.  A Study on an LP-based Model for Restoring Bone-conducted Speech , 2006, 2006 First International Conference on Communications and Electronics.

[5]  Tetsuya Shimamura,et al.  Quality improvement of bone-conducted speech , 2005, Proceedings of the 2005 European Conference on Circuit Theory and Design, 2005..

[6]  Luyong Zhang,et al.  A novel voice collection scheme based on bone-conduction , 2005, IEEE International Symposium on Communications and Information Technology, 2005. ISCIT 2005..

[7]  D.V. Anderson,et al.  Segmentation-based noise suppression for speech coders using auxiliary sensors , 2004, Conference Record of the Thirty-Eighth Asilomar Conference on Signals, Systems and Computers, 2004..

[8]  Dennis H. Klatt,et al.  Prediction of perceived phonetic distance from critical-band spectra: A first step , 1982, ICASSP.

[9]  Schuyler Quackenbush,et al.  Objective measures of speech quality , 1995 .

[10]  Tanja Schultz,et al.  Whispery speech recognition using adapted articulatory features , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[11]  Christophe Ris,et al.  Combined use of close-talk and throat microphones for improved speech recognition under non-stationary background noise , 2004 .

[12]  Xuedong Huang,et al.  Air- and bone-conductive integrated microphones for robust speech detection and enhancement , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[13]  Alan S. Willsky,et al.  Fourier series and estimation on the circle with applications to synchronous communication-I: Analysis , 1974, IEEE Trans. Inf. Theory.

[14]  R. McAulay,et al.  Speech enhancement using a soft-decision noise suppression filter , 1980 .

[15]  Kiyohiro Shikano,et al.  Accurate hidden Markov models for non-audible murmur (NAM) recognition based on iterative supervised adaptation , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[16]  B. Yegnanarayana,et al.  Language identification in noisy environments using throat microphone signals , 2005, Proceedings of 2005 International Conference on Intelligent Sensing and Information Processing, 2005..

[17]  T. Shimamura,et al.  A reconstruction filter for bone-conducted speech , 2005, 48th Midwest Symposium on Circuits and Systems, 2005..

[18]  Jae S. Lim,et al.  Signal estimation from modified short-time Fourier transform , 1983, ICASSP.

[19]  Mikihiro Nakagiri,et al.  Statistical Voice Conversion Techniques for Body-Conducted Unvoiced Speech Enhancement , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[20]  Werner Verhelst,et al.  On Noise Robust Voice Activity Detection , 2011, INTERSPEECH.

[21]  Zicheng Liu,et al.  Leakage model and teeth clack removal for air- and bone-conductive integrated microphones , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[22]  Werner Verhelst,et al.  A Multi-sensor Speech Database with Applications towards Robust Speech Processing in hostile Environments , 2008, LREC.

[23]  Hironori Kitakaze,et al.  A noise-robust speech recognition system making use of body-conducted signals , 2004 .

[24]  Zicheng Liu,et al.  Multisensory processing for speech enhancement and magnitude-normalized spectra for speech modeling , 2008, Speech Commun..

[25]  Andries P. Hekstra,et al.  Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[26]  Masashi Unoki,et al.  Method of LP-based blind restoration for improving intelligibility of bone-conducted speech , 2007, INTERSPEECH.

[27]  I. Cohen Optimal speech enhancement under signal presence uncertainty using log-spectral amplitude estimator , 2002, IEEE Signal Processing Letters.

[28]  K. Nakagawa,et al.  On Equalization of Bone Conducted Speech for Improved Speech Quality , 2006, 2006 IEEE International Symposium on Signal Processing and Information Technology.

[29]  Tomoki Toda,et al.  Voice conversion for various types of body transmitted speech , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[30]  Tetsuya Shimamura,et al.  Reconstruction filter design for bone-conducted speech , 2004, INTERSPEECH.

[31]  William M. Campbell,et al.  Exploiting Nonacoustic Sensors for Speech Encoding , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[32]  Kiyohiro Shikano,et al.  Remodeling of the sensor for non-audible murmur (NAM) , 2005, INTERSPEECH.

[33]  Tomoki Toda,et al.  Technologies for processing body-conducted speech detected with non-audible murmur microphone , 2009, INTERSPEECH.

[34]  T. Shimamura,et al.  Improving Bone-Conducted Speech Quality via Neural Network , 2006, 2006 IEEE International Symposium on Signal Processing and Information Technology.

[35]  John R. Hershey,et al.  Model-based fusion of bone and air sensors for speech enhancement and robust speech recognition , 2004, SAPA@INTERSPEECH.

[36]  H. Franco,et al.  Combining standard and throat microphones for robust speech recognition , 2003, IEEE Signal Processing Letters.

[37]  Ephraim Speech enhancement using a minimum mean square error short-time spectral amplitude estimator , 1984 .

[38]  Kiyohiro Shikano,et al.  Non-Audible Murmur (NAM) Recognition , 2006, IEICE Trans. Inf. Syst..

[39]  Zicheng Liu,et al.  Multi-sensory speech processing: incorporating automatically extracted hidden dynamic information , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[40]  M. V. Scanlon Acoustic monitoring pad , 1995, Proceedings of 17th International Conference of the Engineering in Medicine and Biology Society.

[41]  Bin Chen,et al.  A Laplacian-based MMSE estimator for speech enhancement , 2007, Speech Commun..

[42]  John H. L. Hansen,et al.  Improved "TEO" feature-based automatic stress detection using physiological and acoustic speech sensors , 2005, INTERSPEECH.

[43]  John F. Holzrichter,et al.  Denoising of human speech using combined acoustic and EM sensor signal processing , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[44]  Rongqiang Hu,et al.  Speech enhancement using non-acoustic sensors , 2005, INTERSPEECH.

[45]  Trym Holter,et al.  On the feasibility of ASR in extreme noise using the PARAT earplug communication terminal , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[46]  Zicheng Liu,et al.  Direct filtering for air- and bone-conductive microphones , 2004, IEEE 6th Workshop on Multimedia Signal Processing, 2004..

[47]  Tanja Schultz,et al.  Whispering Speaker Identification , 2007, 2007 IEEE International Conference on Multimedia and Expo.

[48]  David V. Anderson,et al.  Multi-Sensor Spectro-Temporal Comb Filtering for Speech Enhancement , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.