A noise suppression method for body-conducted soft speech based on non-negative tensor factorization of air- and body-conducted signals

This paper presents a novel noise suppression method to enhance soft speech recorded with a special body-conductive microphone called nonaudible murmur (NAM) microphone. NAM microphone is capable of detecting extremely soft speech, but the recorded soft speech easily suffers from external noise due to its faint volume. To effectively suppress noise on the body-conducted signals, an external noise monitoring framework using an air-conducive microphone has been proposed. In this study, we propose a noise suppression method for this framework based on a probabilistic observation model robust against phase variations. In the proposed method, noise suppression process is formulated as a special case of non-negative tensor factorization of the observed air- and body-conducted signals. Experimental results demonstrate that 1) the proposed method consistently outperforms the conventional method under real noisy environments and 2) the proposed method effectively deals with speech acoustic changes caused by the Lombard reflex.

[1]  Hirokazu Kameoka,et al.  Robust speech dereverberation based on non-negativity and sparse nature of speech spectrograms , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[2]  Takeshi Yamada,et al.  Amplitude-based speech enhancement with nonnegative matrix factorization for asynchronous distributed recording , 2014, 2014 14th International Workshop on Acoustic Signal Enhancement (IWAENC).

[3]  Tomoki Toda,et al.  Silent-speech enhancement using body-conducted vocal-tract resonance signals , 2010, Speech Commun..

[4]  J. M. Gilbert,et al.  Silent speech interfaces , 2010, Speech Commun..

[5]  Tomoki Toda,et al.  Voice conversion for various types of body transmitted speech , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  Nancy Bertin,et al.  Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis , 2009, Neural Computation.

[7]  Tanja Schultz,et al.  Modeling coarticulation in EMG-based continuous speech recognition , 2010, Speech Commun..

[8]  Mikihiro Nakagiri,et al.  Statistical Voice Conversion Techniques for Body-Conducted Unvoiced Speech Enhancement , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  Kou Tanaka,et al.  Non-audible murmur enhancement based on statistical conversion using air- and body-conductive microphones in noisy environments , 2015, INTERSPEECH.

[10]  Kiyohiro Shikano,et al.  Non-Audible Murmur (NAM) Recognition , 2006, IEICE Trans. Inf. Syst..

[11]  J C Junqua,et al.  The Lombard reflex and its role on human listeners and automatic speech recognizers. , 1993, The Journal of the Acoustical Society of America.

[12]  Kiyohiro Shikano,et al.  Investigating the role of the Lombard reflex in non-audible murmur (NAM) recognition , 2005, INTERSPEECH.

[13]  Shigeru Katagiri,et al.  A large-scale Japanese speech database , 1990, ICSLP.

[14]  Hirokazu Kameoka,et al.  Reverberation-robust underdetermined source separation with non-negative tensor double deconvolution , 2016, 2016 24th European Signal Processing Conference (EUSIPCO).

[15]  James M. Ortega,et al.  Iterative solution of nonlinear equations in several variables , 2014, Computer science and applied mathematics.

[16]  Bhiksha Raj,et al.  Supervised and Semi-supervised Separation of Sounds from Single-Channel Mixtures , 2007, ICA.

[17]  Gérard Chollet,et al.  Development of a silent speech interface driven by ultrasound and optical images of the tongue and lips , 2010, Speech Commun..

[18]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[19]  Tanja Schultz,et al.  Adaptation for soft whisper recognition using a throat microphone , 2004, INTERSPEECH.

[20]  Tomoki Toda,et al.  Noise suppression method for body-conducted soft speech enhancement based on external noise monitoring , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[21]  Shun-ichi Amari,et al.  Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.