Analysis on paralinguistic prosody control in perceptual impression space using multiple dimensional scaling

A multi-dimensional perceptual space for communicative speech prosodies was derived using a psychometric method from multi-dimensional expressions of impressions to characterize paralinguistic information conveyed by prosody in communication. Single word utterances of ''n'' were employed to allow freedom from lexical effects and to cover communicative prosodic variations as much as possible. The analysis of daily conversations showed that conversational speech impressions were manifested in the global F0 control of ''n'' as differences of average height (high-low) and dynamic patterns (rise, fall, gradual fall, and rise&fall). Using controlled single utterances of ''n'', multiple dimensional scaling analysis was applied to a mutual distance matrix obtained by 26 dimensional vectors expressing perceptual impressions. The result showed the three-dimensional structure of a perceptual impression space, and each dimension corresponded to different F0 control characteristics. The positive-negative impression can be controlled by average F0 height while confident-doubtful or allowable-unacceptable impressions can be controlled by F0 dynamic patterns. Unlike conventional categorical classification of prosodic patterns frequently observed in studies of emotional prosody, this control characterization enables us to flexibly and quantitatively describe prosodic impressions. These experimental results allow the possibility of input specifications for communicative prosody generation using impression vectors and control through average F0 height and F0 dynamic patterns. Instead of the generation of speech with categorical prototypical prosody, more adequate communicative speech synthesis can be approached through input specification and its correspondence with control characteristics.