Evaluation of Extremely Small Sound Source Signals Used in Speaking-Aid System with Statistical Voice Conversion

We have so far proposed a speaking-aid system for laryngectomees using a statistical voice conversion technique. In the proposed system, artificial speech articulated with extremely small sound source signals is detected with a Non-Audible Murmur (NAM) microphone, and then, the detected artificial speech is converted into more natural voice in a probabilistic manner. Although this system basically allows laryngectomees to speak while keeping the external source signals silent, it is still questionable how much these new sound source signals affect the converted speech quality. In this paper, we investigate the impact of various sound source signals on voice conversion accuracy. Various small sound source signals are designed by changing the spectral envelope and the waveform power independently. We conduct objective and subjective evaluations. The results of these experimental evaluations demonstrate that voice conversion accepts 1) various sound source signals with different spectral envelopes and 2) large degree of power of the sound source signals unless the power of speaking parts is almost equal to that of silence parts. Moreover, we also investigate the effectiveness of enhancing auditory feedback during speaking with the extremely small sound source signals.

[1]  Tomoki Toda,et al.  A Speech Communication Aid System for Total Laryngectomees Using Voice Conversion of Body Transmitted Artificial Speech , 2006 .

[2]  A. J. Yates Delayed Auditory Feedback , 1958, Psychological bulletin.

[3]  Garrett B. Stanley,et al.  Design and implementation of a hands-free electrolarynx device controlled by neck strap muscle electromyographic activity , 2004, IEEE Transactions on Biomedical Engineering.

[4]  Eric Moulines,et al.  Continuous probabilistic transform for voice conversion , 1998, IEEE Trans. Speech Audio Process..

[5]  Kiyohiro Shikano,et al.  Remodeling of the sensor for non-audible murmur (NAM) , 2005, INTERSPEECH.

[6]  J. L. B. Richardson Communication After Laryngectomy , 1989 .

[7]  Kiyohiro Shikano,et al.  Non-Audible Murmur (NAM) Recognition , 2006, IEICE Trans. Inf. Syst..

[8]  Tomoki Toda,et al.  NAM-to-speech conversion with Gaussian mixture models , 2005, INTERSPEECH.

[9]  Tohru Ifukube,et al.  Design of a new electrolarynx having a pitch control function , 1994, Proceedings of 1994 3rd IEEE International Workshop on Robot and Human Communication.

[10]  Tomoki Toda,et al.  Improving body transmitted unvoiced speech with statistical voice conversion , 2006, INTERSPEECH.

[11]  Keiichi Tokuda,et al.  Spectral conversion based on maximum likelihood estimation considering global variance of converted parameter , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..