论文信息 - Total LaryngectomeesLaryngeal speaker Tracheostoma Expired air Nasal cavity Vocal folds Oral cavity Trachea Expired air Esophagus

Total LaryngectomeesLaryngeal speaker Tracheostoma Expired air Nasal cavity Vocal folds Oral cavity Trachea Expired air Esophagus

An electrolarynx (EL) is a medical device that generates sound source signals to provide laryngectomees with a voice. In this article we focus on two problems of speech produced with an EL (EL speech). One problem is that EL speech is extremely unnatural and the other is that sound source signals with high energy are generated by an EL, and therefore, the signals often annoy surrounding people. To address these two problems, in this article we propose three speaking-aid systems that enhance three different types of EL speech signals: EL speech, EL speech using an air-pressure sensor (EL-air speech), and silent EL speech. The air-pressure sensor enables a laryngectomee to manipulate the F0 contours of EL speech using exhaled air that flows from the tracheostoma. Silent EL speech is produced with a new sound source unit that generates signals with extremely low energy. Our speaking-aid systems address the poor quality of EL speech using voice conversion (VC), which transforms acoustic features so that it appears as if the speech is uttered by another person. Our systems estimate spectral parameters, F0 and aperiodic components independently. The result of experimental evaluations demonstrates that the use of an air-pressure sensor dramatically improves F0 estimation accuracy. Moreover, it is revealed that the converted speech signals are preferred to source EL speech. 2011 Elsevier B.V. All rights reserved.

K. Shikano | H. Saruwatari | T. Toda | Keigo Nakamura

[1] Yoko Saikachi,et al. Development and perceptual evaluation of amplitude-based F0 control in electrolarynx speech. , 2009, Journal of speech, language, and hearing research : JSLHR.

[2] Tomoki Toda,et al. Acoustic compensation methods for body transmitted speech conversion , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[3] A. Jemal,et al. Cancer Statistics, 2008 , 2008, CA: a cancer journal for clinicians.

[4] Tomoki Toda,et al. Voice Conversion Based on Maximum-Likelihood Estimation of Spectral Parameter Trajectory , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[5] Tomoki Toda,et al. A Speech Communication Aid System for Total Laryngectomees Using Voice Conversion of Body Transmitted Artificial Speech , 2006 .

[6] Tomoki Toda,et al. Improving body transmitted unvoiced speech with statistical voice conversion , 2006, INTERSPEECH.

[7] Hanjun Liu,et al. Enhancement of electrolarynx speech based on auditory masking , 2006, IEEE Transactions on Biomedical Engineering.

[8] Kiyohiro Shikano,et al. Non-Audible Murmur (NAM) Recognition , 2006, IEICE Trans. Inf. Syst..

[9] Hirokazu Takahashi MEng,et al. A voice-generation system using an intramouth vibrator , 2006, Journal of Artificial Organs.

[10] Tomoki Toda,et al. NAM-to-speech conversion with Gaussian mixture models , 2005, INTERSPEECH.

[11] David Talkin,et al. A Robust Algorithm for Pitch Tracking ( RAPT ) , 2005 .

[12] Garrett B. Stanley,et al. Design and implementation of a hands-free electrolarynx device controlled by neck strap muscle electromyographic activity , 2004, IEEE Transactions on Biomedical Engineering.

[13] R. Weber,et al. Concurrent chemotherapy and radiotherapy for organ preservation in advanced laryngeal cancer. , 2003, The New England journal of medicine.

[14] M M Carr,et al. Communication after laryngectomy: An assessment of quality of life , 2000, Otolaryngology--head and neck surgery : official journal of American Academy of Otolaryngology-Head and Neck Surgery.

[15] Hideki Kawahara,et al. Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds , 1999, Speech Commun..

[16] Alexander Kain,et al. Spectral voice conversion for text-to-speech synthesis , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[17] O Laccourreye,et al. Supracricoid Partial Laryngectomy After Failed Laryngeal Radiation Therapy , 1996, The Laryngoscope.

[18] Keiichi Tokuda,et al. An adaptive algorithm for mel-cepstral analysis of speech , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[19] J. L. B. Richardson. Communication After Laryngectomy , 1989 .

[20] S. Imai,et al. Mel Log Spectrum Approximation (MLSA) filter for speech synthesis , 1983 .

[21] M. Singer,et al. An Endoscopic Technique for Restoration of Voice after Laryngectomy , 1980, The Annals of otology, rhinology, and laryngology.

[22] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[23] H. Martin. Rehabilitation of the laryngectomee , 1963, Cancer.