An evaluation of alaryngeal speech enhancement methods based on voice conversion techniques

In this study, we evaluate our proposed methods for enhancing alaryngeal speech based on statistical voice conversion techniques. Voice conversion based on a Gaussian mixture model has been applied to the conversion of alaryngeal speech into normal speech (AL-to-Speech). Moreover, one-to-many eigenvoice conversion (EVC) has also been applied to AL-to-Speech to enable the recovery of the original voice quality of laryngectomees even if only one arbitrary utterance of the original voice is available. VC/EVC-based AL-to-Speech systems have been developed for several types of alaryngeal speech, such as esophageal speech (ES), electrolaryngeal speech (EL), and body-conducted silent electrolaryngeal speech (silent EL). These proposed systems are compared with each other from various perspectives. The experimental results demonstrate that our proposed systems yield significant enhancement effects on each type of alaryngeal speech.

[1]  Tomoki Toda,et al.  Esophageal Speech Enhancement Based on Statistical Voice Conversion with Gaussian Mixture Models , 2010, IEICE Trans. Inf. Syst..

[2]  Satoshi Nakamura,et al.  Voice conversion through vector quantization , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[3]  Tomoki Toda,et al.  Eigenvoice conversion based on Gaussian mixture model , 2006, INTERSPEECH.

[4]  Tomoki Toda,et al.  The use of air-pressure sensor in electrolaryngeal speech enhancement based on statistical voice conversion , 2010, INTERSPEECH.

[5]  Tomoki Toda,et al.  Voice Conversion Based on Maximum-Likelihood Estimation of Spectral Parameter Trajectory , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  Eric Moulines,et al.  Continuous probabilistic transform for voice conversion , 1998, IEEE Trans. Speech Audio Process..

[7]  Tomoki Toda,et al.  Speaking-Aid Systems Based on One-to-Many Eigenvoice Conversion for Total Laryngectomees , 2010 .

[8]  Keiichi Tokuda,et al.  Mel-generalized cepstral analysis - a unified approach to speech spectral estimation , 1994, ICSLP.

[9]  Hideki Kawahara,et al.  Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds , 1999, Speech Commun..

[10]  Tomoki Toda,et al.  Evaluation of Extremely Small Sound Source Signals Used in Speaking-Aid System with Statistical Voice Conversion , 2010, IEICE Trans. Inf. Syst..