Voice quality and f0 cues for affect expression: implications for synthesis

Synthesised stimuli were used to investigate how two notionally separable dimensions of tone-of-voice – voice quality and fundamental frequency – are involved in the expression of affect. Listeners were presented with three series of stimuli: (1) stimuli exemplifying different voice qualities, (2) stimuli all with modal voice quality but with different affect-related f0 contours, and (3) stimuli incorporating variation in both voice quality and affect-related f0 contours. A total of 15 stimuli were rated for 12 different affective attributes. Voice quality differentiation appears to account for the highest affect ratings overall, as indicated by the scores obtained for stimuli series (1) and (3). The relatively weaker affect signalling of stimuli differentiated by f0 alone corroborates findings in [2]. It also suggests that for the generation of expressive, affectively coloured speech synthesis, it is not sufficient to manipulate only f0; we also need to capture the voice quality dimension of the voice source.