A multi-dimensional perceptual space for communicative speech prosodies was derived using a psychometric method from multi-dimensional expressions of impressions to characterize paralinguistic information conveyed by prosody in communication. Single word utterances of ''n'' were employed to allow freedom from lexical effects and to cover communicative prosodic variations as much as possible. The analysis of daily conversations showed that conversational speech impressions were manifested in the global F0 control of ''n'' as differences of average height (high-low) and dynamic patterns (rise, fall, gradual fall, and rise&fall). Using controlled single utterances of ''n'', multiple dimensional scaling analysis was applied to a mutual distance matrix obtained by 26 dimensional vectors expressing perceptual impressions. The result showed the three-dimensional structure of a perceptual impression space, and each dimension corresponded to different F0 control characteristics. The positive-negative impression can be controlled by average F0 height while confident-doubtful or allowable-unacceptable impressions can be controlled by F0 dynamic patterns. Unlike conventional categorical classification of prosodic patterns frequently observed in studies of emotional prosody, this control characterization enables us to flexibly and quantitatively describe prosodic impressions. These experimental results allow the possibility of input specifications for communicative prosody generation using impression vectors and control through average F0 height and F0 dynamic patterns. Instead of the generation of speech with categorical prototypical prosody, more adequate communicative speech synthesis can be approached through input specification and its correspondence with control characteristics.
[1]
Nick Campbell.
Accounting for Voice-Quality Variation
,
2004
.
[2]
Kikuo Maekawa.
Production and Perception of ‘Paralinguistic’ Information
,
2003
.
[3]
Yoshinori Sagisaka,et al.
A trial of communicative prosody generation based on control characteristic of one word utterance observed in real conversational speech
,
2006
.
[4]
Keikichi Hirose,et al.
Analysis of voice fundamental frequency contours for declarative sentences of Japanese
,
1984
.
[5]
W. Torgerson.
Multidimensional scaling: I. Theory and method
,
1952
.
[6]
H. Ishiguro,et al.
Using Prosodic and Voice Quality Features for Paralinguistic Information Extraction
,
2006
.
[7]
Hiroshi Ishiguro,et al.
Automatic extraction of paralinguistic information using prosodic features related to F
,
2008,
Speech Commun..
[8]
Nick Campbell,et al.
What do People Hear? A Study of the Perception of Non-verbal Affective Information in Conversational Speech( Emotion in Speech)
,
2004
.
[9]
Albert Rilliard,et al.
The prosodic dimensions of emotion in speech: the relative weights of parameters
,
2005,
INTERSPEECH.