A Vibration Control Method of an Electrolarynx Based on Statistical F0 Pattern Prediction

This paper presents a novel speaking aid system to help laryngectomees produce more naturally sounding electrolaryngeal (EL) speech. An electrolarynx is an external device to generate excitation signals, instead of vibration of the vocal folds. Although the conventional EL speech is quite intelligible, its naturalness suffers from the unnatural fundamental frequency (F0) patterns of the mechanically generated excitation signals. To improve the naturalness of EL speech, we have proposed EL speech enhancement methods using statistical F0 pattern prediction. In these methods, the original EL speech recorded by a microphone is presented from a loudspeaker after performing the speech enhancement. These methods are effective for some situation, such as telecommunication, but it is not suitable for face-to-face conversation because not only the enhanced EL speech but also the original EL speech is presented to listeners. In this paper, to develop an EL speech enhancement also effective for face-to-face conversation, we propose a method for directly controlling F0 patterns of the excitation signals to be generated from the electrolarynx using the statistical F0 prediction. To get an ”actual feel” of the proposed system, we also implement a prototype system. By using the prototype system, we find latency issues caused by a real-time processing. To address these latency issues, we furthermore propose segmental continuous F0 pattern modeling and forthcoming F0 pattern modeling. With evaluations through simulation, we demonstrate that our proposed system is capable of effectively addressing the issues of latency and those of electrolarynx in term of the naturalness. key words: laryngectomee, electrolarynx, voice restoration, speech enhancement, statistical F0 pattern prediction

[1]  Alexander Kain,et al.  Spectral voice conversion for text-to-speech synthesis , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[2]  Hideki Kawahara,et al.  Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds , 1999, Speech Commun..

[3]  Kou Tanaka,et al.  A Hybrid Approach to Electrolaryngeal Speech Enhancement Based on Noise Reduction and Statistical Excitation Generation , 2014, IEICE Trans. Inf. Syst..

[4]  Tomoki Toda,et al.  A digital signal processor implementation of silent/electrolaryngeal speech enhancement based on real-time statistical voice conversion , 2013, INTERSPEECH.

[5]  Tohru Ifukube,et al.  Design of a new electrolarynx having a pitch control function , 1994, Proceedings of 1994 3rd IEEE International Workshop on Robot and Human Communication.

[6]  T. Shipp,et al.  Predicting vocal frequency from selected physiologic measures. , 1979, The Journal of the Acoustical Society of America.

[7]  Mikihiro Nakagiri,et al.  Statistical Voice Conversion Techniques for Body-Conducted Unvoiced Speech Enhancement , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Tomoki Toda,et al.  Implementation of Computationally Efficient Real-Time Voice Conversion , 2012, INTERSPEECH.

[9]  Tanja Schultz,et al.  Estimation of fundamental frequency from surface electromyographic data: EMG-to-F0 , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[10]  Tomoki Toda,et al.  Maximum likelihood voice conversion based on GMM with STRAIGHT mixed excitation , 2006, INTERSPEECH.

[11]  Klaus J. Kohler Papers in Laboratory Phonology: Macro and micro F 0 in the synthesis of intonation , 1990 .

[12]  Hideki Kasuya,et al.  Development and evaluation of pitch adjustable electrolarynx , 2004, Speech Prosody 2004.

[13]  Hideki Kawahara,et al.  Aperiodicity extraction and control using mixed mode excitation and group delay manipulation for a high quality speech analysis, modification and synthesis system STRAIGHT , 2001, MAVEBA.

[14]  John Kingston,et al.  Macro and micro F0 in the synthesis of intonation , 1990 .

[15]  Keiichi Tokuda,et al.  Speech parameter generation algorithms for HMM-based speech synthesis , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[16]  Tomoki Toda,et al.  Speaking-aid systems using GMM-based voice conversion for electrolaryngeal speech , 2012, Speech Commun..

[17]  Eric Moulines,et al.  Continuous probabilistic transform for voice conversion , 1998, IEEE Trans. Speech Audio Process..

[18]  B. Roubeau,et al.  Electromyographic activity of strap and cricothyroid muscles in pitch change. , 1997, Acta oto-laryngologica.

[19]  Tomoki Toda,et al.  Alaryngeal Speech Enhancement Based on One-to-Many Eigenvoice Conversion , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[20]  Tomoki Toda,et al.  Voice Conversion Based on Maximum-Likelihood Estimation of Spectral Parameter Trajectory , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[21]  Kenji Matsui,et al.  Development of electrolarynx with hands-free prosody control , 2013, SSW.