Inside out - Acoustic and visual aspects of verbal and non-verbal communication : Keynote Paper

In face-to-face communication both visual andauditory information play an obvious andsignificant role. In this presentation we will discusswork done, primarily at KTH, that aims atanalyzing and modelling verbal and non-verbalcommunication from a multi-modal perspective. Inour studies, it appears that both segmental andprosodic phenomena are strongly affected by thecommunicative context of speech interaction. Oneplatform for modelling audiovisual speechcommunication is the ECA, embodiedconversational agent. We will describe how ECAshave been used in our research, including examplesof applications and a series of experiments forstudying multimodal aspects of speechcommunication.

[1]  Olov Engwall,et al.  Combining MRI, EMA and EPG measurements in a three-dimensional tongue model , 2003, Speech Commun..

[2]  Ivan Fónagy La mimique buccale , 1976 .

[3]  Jonas Beskow,et al.  Rule-based visual speech synthesis , 1995, EUROSPEECH.

[4]  Björn Granström,et al.  Timing and interaction of visual cues for prominence in audiovisual speech perception , 2001, INTERSPEECH.

[5]  Dominic W. Massaro,et al.  Read my tongue movements: bimodal learning to perceive and produce non-native speech /r/ and /l/ , 2003, INTERSPEECH.

[6]  Olle Bälter,et al.  Designing the user interface of the computer-based speech training system ARTUR based on early user tests , 2006, Behav. Inf. Technol..

[7]  Jens Edlund,et al.  Pushy versus meek - using avatars to influence turn-taking behaviour , 2007, INTERSPEECH.

[8]  Björn Granström,et al.  Visual correlates to prominence in several expressive modes , 2006, INTERSPEECH.

[9]  J. V. Kuppevelt,et al.  Advances in natural multimodal dialogue systems , 2005 .

[10]  Jonas Beskow,et al.  Data-driven synthesis of expressive visual speech using an MPEG-4 talking head , 2005, INTERSPEECH.

[11]  Björn Granström,et al.  Multimodal feedback cues in human-machine interactions , 2002, Speech Prosody 2002.

[12]  Jonas Beskow,et al.  Wavesurfer - an open source speech tool , 2000, INTERSPEECH.

[13]  G. Plant Perceiving Talking Faces: From Speech Perception to a Behavioral Principle , 1999 .

[14]  John Nicholas Holmes,et al.  Speech synthesis , 1972 .

[15]  Parke,et al.  Parameterized Models for Facial Animation , 1982, IEEE Computer Graphics and Applications.

[16]  Jonas Beskow,et al.  Resynthesis of 3d tongue movements from facial data , 2003, INTERSPEECH.

[17]  J. Cassell,et al.  Social Dialongue with Embodied Conversational Agents , 2005 .

[18]  J. Cassell,et al.  SOCIAL DIALOGUE WITH EMBODIED CONVERSATIONAL AGENTS , 2005 .

[19]  Joakim Gustafson,et al.  Repetition in a Swedish database of spontaneous computer-directed speech , 1999 .

[20]  Björn Granström,et al.  Resynthesis of Facial and Intraoral Articulation fromSimultaneous Measurements , 2003 .

[21]  Björn Granström,et al.  Measurements of articulatory variation in expressive speech for a set of Swedish vowels , 2004, Speech Commun..

[22]  Jonas Beskow,et al.  SYNFACE - A Talking Head Telephone for the Hearing-Impaired , 2004, ICCHP.

[23]  Mark Steedman,et al.  Generating Facial Expressions for Speech , 1996, Cogn. Sci..

[24]  David House Phrase-final rises as a prosodic feature in wh-questions in Swedish human-machine dialogue , 2005, Speech Commun..