论文信息 - A linguistic and prosodic database for data-driven Japanese TTS synthesis

A linguistic and prosodic database for data-driven Japanese TTS synthesis

We propose a method to generate a database that contains a parametric representation of F0 contours associated with linguistic and acoustic information, to be used by data-driven Japanese text-to-speech (TTS) systems. The configuration of the database includes recorded speech, F0 contours and their parametric labels, phonetic transcription with durations, and other linguistic information such as orthographic transcription, part-of-speech (POS) tags, and accent types. All information that is not available by dictionary lookup is obtained automatically. In this paper, we propose a method to automatically obtain parametric labels that describe F0 contours based on a superpositional model. Preliminary tests on a small data set show that the method can find the parametric representation of F0 contours with acceptable accuracy, and that accuracy can be improved by intr oducing additional linguistic information.

Keikichi Hirose | Atsuhiro Sakurai | Takashi Natsume

[1] Harald Singer,et al. Automatic prosodic segmentation by F/sub 0/ clustering using superpositional modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[2] Keikichi Hirose,et al. Analysis of voice fundamental frequency contours for declarative sentences of Japanese , 1984 .

[3] Yoshinori Sagisaka,et al. Automatic Extraction of F 0 Control Rules Using Statistical Analysis , 1997 .

[4] Keikichi Hirose,et al. A scheme for pitch extraction of speech using autocorrelation function with frame length proportional to the time lag , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5] Keikichi Hirose,et al. Detection of phrase boundaries in Japanese by low-pass filtering of fundamental frequency contours , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[6] Mari Ostendorf,et al. TOBI: a standard for labeling English prosody , 1992, ICSLP.