Intonational and visual cues in the perception of interrogative mode in Swedish

This paper presents results from two perception experiments designed to investigate intonational cues and visual facial cues to interrogative mode in Swedish. Results from the intonation test indicate that both a widened F0 range on a final focal accent and time alignment properties of the F0 rise and peak make important contributions to the interrogative percept. Results from the audiovisual test showed that vertical head nodding and smiling tended to reinforce declarative intonation while interrogative intonation was not strengthened by hypothesized interrogative visual cues consisting of eyebrow movement and slow vertical head movement. The interaction between audio and visual cues for accentuation and interrogative mode is discussed and some implications of adding the visual modality to the traditional definition of question intonation are explored. The signaling of interrogative mode in speech is a topic which has long attracted interest from intonation researchers. The description of question intonation in languages has not, however, been simple and is far from uncontroversial. Different languages and different types of questions produce different kinds of question intonation. The most commonly described characteristic for questions is high final pitch and overall higher pitch [1]. In some languages, however, e.g. Neapolitan Italian [2], the time alignment of a final accent has been shown to play a decisive role in the perception of interrogative mode. In Swedish, question intonation has been primarily described as marked by a raised topline and a widened F0 range on the focal accent [3]. An optional terminal rise has been described, but the time alignment of the focal accent rise has not generally been associated with question intonation. Instead, a rightward shift of the focal accent peak has been associated with lending prominence to given domain-specific information in a dialogue context [4]. The role of visual facial cues in signaling the interrogative mode is an area which has not received as much attention. There has been, however, considerable research carried out on the timing and synchronization of articulator movements in audiovisual speech processing, e.g. [5], and on describing spoken and gestural conversational signals in human to human interactions [6]. There have also been exploratory investigations on visual cues for prominence and feedback signaling [7][8][9]. Work aimed at investigating the coordination of audio and visual interrogative signals in speech perception and the implementation of this knowledge in audiovisual synthesis is not as well represented. The purpose of this study is to investigate both intonational …