Statistical approaches to enhancement of body-conducted speech detected with non-audible murmur microphone

In this paper, we review our recent research on technologies for enhancing body-conducted speech detected with nonaudible murmur (NAM) microphone. NAM microphone has been developed to detect an extremely soft whispered voice, which is useful for silent speech communication. Moreover, it is also capable of detecting other voices such as a soft voice and a normal voice while effectively reducing external noise owing to its noise-proof structure. On the other hand, speech quality of the detected voices severely degrades by the body-conductive recording. To address this issue, we have developed technologies for statistically converting body-conducted speech into normal speech. This paper gives an overview of these technologies and a further attempt to make it possible to use them for human-to-human speech communication.

[1]  Tanja Schultz,et al.  Adaptation for soft whisper recognition using a throat microphone , 2004, INTERSPEECH.

[2]  Tomoki Toda,et al.  Voice Conversion Based on Maximum-Likelihood Estimation of Spectral Parameter Trajectory , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  Tomoki Toda,et al.  Computationally efficient body-conducted voice conversion with original excitation signals , 2011 .

[4]  K. Tokuda,et al.  Speech parameter generation from HMM using dynamic features , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[5]  Kiyohiro Shikano,et al.  Non-Audible Murmur (NAM) Recognition , 2006, IEICE Trans. Inf. Syst..

[6]  Tomoki Toda,et al.  NAM-to-speech conversion with Gaussian mixture models , 2005, INTERSPEECH.

[7]  K. Koishida,et al.  Vector quantization of speech spectral parameters using statistics of dynamic features , 1997 .

[8]  Mark J. F. Gales,et al.  Semi-tied covariance matrices for hidden Markov models , 1999, IEEE Trans. Speech Audio Process..

[9]  Hideki Kawahara,et al.  Aperiodicity extraction and control using mixed mode excitation and group delay manipulation for a high quality speech analysis, modification and synthesis system STRAIGHT , 2001, MAVEBA.

[10]  Hideki Kawahara,et al.  Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds , 1999, Speech Commun..

[11]  Tomoki Toda,et al.  Acoustic compensation methods for body transmitted speech conversion , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[12]  Tomoki Toda,et al.  Silent-speech enhancement using body-conducted vocal-tract resonance signals , 2010, Speech Commun..

[13]  Tomoki Toda,et al.  Voice conversion for various types of body transmitted speech , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[14]  Alexander Kain,et al.  Spectral voice conversion for text-to-speech synthesis , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[15]  Takao Kobayashi,et al.  Analysis of Speaker Adaptation Algorithms for HMM-Based Speech Synthesis and a Constrained SMAPLR Adaptation Algorithm , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[16]  Tomoki Toda,et al.  Acoustic Compensation Method for Accepting Different Recording Devices in Body-Conducted Voice Conversion , 2010 .

[17]  Eric Moulines,et al.  Continuous probabilistic transform for voice conversion , 1998, IEEE Trans. Speech Audio Process..

[18]  Zicheng Liu,et al.  Multisensory processing for speech enhancement and magnitude-normalized spectra for speech modeling , 2008, Speech Commun..

[19]  Keiichi Tokuda,et al.  Speech parameter generation algorithms for HMM-based speech synthesis , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[20]  Tomoki Toda,et al.  Improving body transmitted unvoiced speech with statistical voice conversion , 2006, INTERSPEECH.

[21]  Tomoki Toda,et al.  Technologies for processing body-conducted speech detected with non-audible murmur microphone , 2009, INTERSPEECH.

[22]  Mark J. F. Gales,et al.  Maximum likelihood linear transformations for HMM-based speech recognition , 1998, Comput. Speech Lang..

[23]  J. M. Gilbert,et al.  Silent speech interfaces , 2010, Speech Commun..

[24]  Keiichi Tokuda,et al.  Mel-generalized cepstral analysis - a unified approach to speech spectral estimation , 1994, ICSLP.