Virtual Storytelling: Emotions for the narrator

The development of virtual story-telling is an ever ongoing process. As long as it does not perform on the same level as a human storyteller, there is room for improvement. The Virtual Storyteller, a project of the HMI department of the University of Twente, uses a Text-to-Speech application that creates synthetic speech from text input. It is de¯nitely not a trivial matter to make synthetic speech sound human, especially when story-telling is involved. There are so much facets of story-telling where storytellers use their voice in order to enhance their performance. This thesis describes the study of how storytellers use their voice in order to convey the emotions that characters in the story are experiencing; speci¯cally how the storyteller changes his voice to make a character say something in an emotional way. An experiment has been conducted to identify the emotional charge in fragments of speech by story-characters. These fragments are then analysed in an attempt to ¯nd out how the emotional charge is linked to the way the storyteller changes his voice. The analysis is then used in the creation of a model that is used by the Text-to-Speech application to synthesise emotional speech instead of neutral speech. This model is implemented in an open-source Text-to-Speech application that uses a Dutch voice. This allows the Virtual Storyteller to create tekst marked with an emotion, which is then used to synthesise emotional speech.

[1]  Sander Faas Virtual Storyteller: An Approach to Computational Storytelling , 2002 .

[2]  Sylvie J. L. Mozziconacci,et al.  Modeling Emotion and Attitude in Speech by Means of Perceptually Based Parameter Values , 2001, User Modeling and User-Adapted Interaction.

[3]  Andrew Ortony,et al.  The Cognitive Structure of Emotions , 1988 .

[4]  Barbara Heuft,et al.  Emotions in time domain synthesis , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[5]  R. Plutchik Emotion, a psychoevolutionary synthesis , 1980 .

[6]  Iain R. Murray,et al.  RULE-BASED EMOTION SYNTHESIS USING CONCATENATED SPEECH , 2000 .

[7]  Roddy Cowie,et al.  What a neural net needs to know about emotion words , 1999 .

[8]  John L. Arnott,et al.  Implementation and testing of a system for producing emotion-by-rule in synthetic speech , 1995, Speech Commun..

[9]  I. Iriondo,et al.  VALIDATION OF AN ACOUSTICAL MODELLING OF EMOTIONAL EXPRESSION IN SPANISH USING SPEECH SYNTHESIS TECHNIQUES , 2000 .

[10]  Raimo Bakis,et al.  Multilayered extensions to the speech synthesis markup language for describing expressiveness , 2003, INTERSPEECH.

[11]  Paul Boersma,et al.  Praat, a system for doing phonetics by computer , 2002 .

[12]  R. Cowie,et al.  A new emotion database: considerations, sources and scope , 2000 .

[13]  Dirk Heylen,et al.  The virtual storyteller , 2002, SIGG.

[14]  C. Gobl,et al.  TESTING AFFECTIVE CORRELATES OF VOICE QUALITY THROUGH ANALYSIS AND RESYNTHESIS , 2000 .

[15]  Murray Alpert,et al.  Emotion in Speech: The Acoustic Attributes of Fear, Anger, Sadness, and Joy , 1999, Journal of psycholinguistic research.

[16]  Brian Sturm The "Storylistening" Trance Experience , 2000 .

[17]  Nick Campbell,et al.  Automatic labelling of voice-quality in speech databases for synthesis , 2000, INTERSPEECH.

[18]  P. Boersma Praat : doing phonetics by computer (version 4.4.24) , 2006 .

[19]  Christina L. Bennett Large scale evaluation of corpus-based synthesizers: results and lessons from the blizzard challenge 2005 , 2005, INTERSPEECH.

[20]  A quantitative model for formant dynamics and contextually assimilated reduction in fluent speech , 2004, INTERSPEECH.

[21]  Sjl Mozziconacci Speech variability and emotion : production and perception , 1998 .

[22]  Klaus Krippendorff,et al.  Content Analysis: An Introduction to Its Methodology , 1980 .

[23]  Hazim Kemal Ekenel,et al.  Role of Intonation Patterns in Conveying Emotion In Speech , 2003 .

[24]  W. Sendlmeier,et al.  Verification of acoustical correlates of emotional speech using formant-synthesis , 2000 .

[25]  Erhard Rank,et al.  Generating emotional speech with a concatenative synthesizer , 1998, ICSLP.

[26]  Thierry Dutoit High quality text-to-speech synthesis: a comparison of four candidate algorithms , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[27]  J. Turner Human Emotions: A Sociological Theory , 2007 .

[28]  Thierry Moudenc,et al.  Towards an Expressive Typology in Storytelling: A Perceptive Approach , 2005, ACII.

[29]  J. Montero,et al.  ANALYSIS AND MODELLING OF EMOTIONAL SPEECH IN SPANISH , 1999 .

[30]  John Kingston,et al.  Papers in Laboratory Phonology: Index of names , 1990 .

[31]  Roddy Cowie,et al.  FEELTRACE: an instrument for recording perceived emotion in real time , 2000 .

[32]  Carlos Gussenhoven,et al.  Gesture, Segment, Prosody: Downstep in Dutch: implications for a model , 1992 .

[33]  Mari Ostendorf,et al.  SABLE: a standard for TTS markup , 1998, ICSLP.

[34]  Aijun Li,et al.  Prosody conversion from neutral speech to emotional speech , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[35]  J. Stainer,et al.  The Emotions , 1922, Nature.

[36]  Juan Manuel Montero-Martínez,et al.  Emotional speech synthesis: from speech database to TTS , 1998, ICSLP.

[37]  Dave Burke Speech Synthesis Markup Language (SSML) , 2007 .

[38]  Inma Hernáez,et al.  An objective and subjective study of the role of semantics and prosodic features in building corpora for emotional TTS , 2006, IEEE Transactions on Audio, Speech, and Language Processing.