A trial of communicative prosody generation based on control characteristic of one word utterance observed in real conversational speech

Aiming at prosody control for conversational speech synthesis, communicative prosodies were generated based on the prosodic characteristics derived from one word utterance “ n” . The grouping of F0 patterns using VQ revealed four F0 dynamic patterns (rise, gradual fall, fall, and rise&fall) for large amounts of one-word utterance “n” in daily conversations. Through the analysis using an F0 generation model, different control characteristics were found for these patterns. A communicative prosody control scheme is proposed for short utterances reflecting these control characteristics for three dimensional representative perceptual impressions, confident-doubtful, allowable-unacceptable and positive-negative previously obtained by MDS analysis. The naturalness evaluation tests for synthesized conversational speech showed superiority in naturalness of the proposed prosody control. These results indicate the possibility of communicative prosody generation for conversational speech synthesis through perceptional impression expressions using corpus-based approach.