Voice conversion for various types of body transmitted speech

In this paper, we review our proposed statistical voice conversion approaches to enhancing various types of body transmitted speech captured with non-audible murmur (NAM) microphone. Body transmitted speech conversion is a potential technique to bring a new paradigm to human-to-human speech communication. In addition to our previously proposed methods of enhancing body transmitted unvoiced speech for silent speech communication and of enhancing body transmitted artificial speech for speaking aid, we further propose conversion methods of enhancing body transmitted voiced speech for noise robust speech communication. An experimental result demonstrates that the proposed methods yield significant improvements in quality of body transmitted voiced speech.

[1]  Tomoki Toda,et al.  Voice Conversion Based on Maximum-Likelihood Estimation of Spectral Parameter Trajectory , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Roy D. Patterson,et al.  Fixed point analysis of frequency to instantaneous frequency mapping for accurate estimation of F0 and periodicity , 1999, EUROSPEECH.

[3]  Hideki Kawahara,et al.  Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds , 1999, Speech Commun..

[4]  Tomoki Toda,et al.  Impact of various small sound source signals on voice conversion accuracy in speech communication aid for laryngectomees , 2007, INTERSPEECH.

[5]  Kiyohiro Shikano,et al.  Non-Audible Murmur (NAM) Recognition , 2006, IEICE Trans. Inf. Syst..

[6]  Tomoki Toda,et al.  NAM-to-speech conversion with Gaussian mixture models , 2005, INTERSPEECH.

[7]  Yoshinori Sagisaka,et al.  Acoustic characteristics of speaker individuality: Control and conversion , 1995, Speech Commun..

[8]  Eric Moulines,et al.  Continuous probabilistic transform for voice conversion , 1998, IEEE Trans. Speech Audio Process..

[9]  Zicheng Liu,et al.  Multisensory processing for speech enhancement and magnitude-normalized spectra for speech modeling , 2008, Speech Commun..

[10]  Tomoki Toda,et al.  Speaking aid system for total laryngectomees using voice conversion of body transmitted artificial speech , 2006, INTERSPEECH.

[11]  Alexander Kain,et al.  Spectral voice conversion for text-to-speech synthesis , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[12]  Tanja Schultz,et al.  Adaptation for soft whisper recognition using a throat microphone , 2004, INTERSPEECH.

[13]  L. Maier-Hein,et al.  Session independent non-audible speech recognition using surface electromyography , 2005, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005..

[14]  Tomoki Toda,et al.  Improving body transmitted unvoiced speech with statistical voice conversion , 2006, INTERSPEECH.

[15]  Kiyohiro Shikano,et al.  Remodeling of the sensor for non-audible murmur (NAM) , 2005, INTERSPEECH.

[16]  Tomoki Toda,et al.  Maximum likelihood voice conversion based on GMM with STRAIGHT mixed excitation , 2006, INTERSPEECH.

[17]  Gérard Chollet,et al.  Continuous-speech phone recognition from ultrasound and optical images of the tongue and lips , 2007, INTERSPEECH.

[18]  Tomoki Toda,et al.  Evaluation of speaking-aid system with voice conversion for laryngectomees toward its use in practical environments , 2008, INTERSPEECH.