The report proposes a simple and practical model for generating relatively monotonous, but sufficiently natural, prosodic features by analyzing restricted natural speech. The basic assumption of this model is that the natural F0 pattern can be obtained without complicated linguistic analysis. To achieve this prosodic control, the authors have analyzed and modeled this speech subject that is recoded so that it will appear in the following. First they composed the hypothesis that a Japanese major phrase (MP) could be modeled with the combination of a minor phrase (mp) sequence limited to fewer than three. The number of the combination is decided by the accentual type of minor phrase and intrasentence position. The combination types have 28 patterns. To confirm the hypothesis, the restricted speech (RSP) subjects were collected and analyzed by having the speaker utter the subject sentence without emotional effect or attention to prosodic features. Furthermore, to evaluate the performance of the model, a pattern-matching process (two-level DP) was used between the synthesized F0 pattern and the restricted real F0 pattern. They thus confirmed that the model would create a synthesized F0 pattern sufficiently similar the restricted-speech patterns. The synthesized speech using this model sounds relatively monotonous, but is sufficiently natural as compared with general spontaneous speech.
[1]
窪薗 晴夫,et al.
The organization of Japanese prosody
,
1987
.
[2]
Tomoki Hamagami,et al.
A method for estimating prosodic symbol from text for Japanese text-to-speech synthesis
,
1996,
Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.
[3]
J. Pierrehumbert,et al.
Japanese Tone Structure
,
1988
.
[4]
Keikichi Hirose,et al.
Manifestation of Linguistic Information in the Voice Fundamental Frequency Contours of Spoken Japanese (Special Section on Speech Synthesis: Current Technologies and Equipment)
,
1993
.
[5]
Keikichi Hirose,et al.
Manifestation of linguistic and para-linguistic information in the voice fundamental frequency contours of spoken Japanese
,
1990,
ICSLP.