A VOWEL-STRESS EMOTIONAL SPEECH ANALYSIS METHOD

The analysis of speech, particularly for emotional content, is an open area of current research. This paper documents the development of a vowel-stress analysis framework for emotional speech, which is intended to provide suitable assessment of the assets obtained in terms of their prosodic attributes. The consideration of different levels of vowel-stress provides means by which the salient points of a signal may be analysed in terms of their overall priority to the listener. The prosodic attributes of these events can thus be assessed in terms of their overall significance, in an effort to provide a means of categorising the acoustic correlates of emotional speech. The use of vowel-stress is performed in conjunction with the definition of pitch and intensity contours, alongside other micro-prosodic information relating to voice quality. Keywords— Acoustic signal analysis, Speech analysis, Speech processing, Speech Corpus.

[1]  B. Hammarberg,et al.  Vocal Fold Physiology: Acoustic, Perceptual, and Physiological Aspects of Voice Mechanisms , 1991 .

[2]  Roddy Cowie,et al.  Describing the emotional states that are expressed in speech , 2003, Speech Commun..

[3]  F. Ramus Acoustic correlates of linguistic rhythm: Perspectives , 2002 .

[4]  J Bertoncini,et al.  An investigation of young infants' perceptual representations of speech sounds. , 1988, Journal of experimental psychology. General.

[5]  Eric Keller,et al.  Prosodic aspects of speech , 1995 .

[6]  Nick Campbell,et al.  DATABASES OF EMOTIONAL SPEECH , 2000 .

[7]  R. Cowie,et al.  A new emotion database: considerations, sources and scope , 2000 .

[8]  C. Gobl,et al.  Expressive synthesis: how crucial is voice quality? , 2002, Proceedings of 2002 IEEE Workshop on Speech Synthesis, 2002..

[9]  George N. Votsis,et al.  Emotion recognition in human-computer interaction , 2001, IEEE Signal Process. Mag..

[10]  J. Morgan,et al.  SIGNAL TO SYNTAX : Bootstrapping From Speech to Grammar in Early Acquisition , 2008 .

[11]  C. Cullen,et al.  Generation of High Quality Audio Natural Emotional Speech Corpus using Task Based Mood Induction , 2006 .

[12]  Ann Cutler,et al.  Prosody in the Comprehension of Spoken Language: A Literature Review , 1997, Language and speech.

[13]  Ian Maddieson,et al.  Des lexiques aux syllabes des langues du monde. Typologies et structures , 2000 .

[14]  Roddy Cowie,et al.  Emotional speech: Towards a new generation of databases , 2003, Speech Commun..

[15]  F. Ramus,et al.  Correlates of linguistic rhythm in the speech signal , 1999, Cognition.

[16]  F. Ramus Language discrimination by newborns: Teasing apart phonotactic, rhythmic, and intonational cues , 2002 .

[17]  D H Whalen,et al.  Perception of pitch location within a speaker's F0 range. , 2005, The Journal of the Acoustical Society of America.

[18]  The Use of Task Based Mood-Induction Procedures to Generate High Quality Emotional Assets , 2006 .

[19]  David Crystal,et al.  A dictionary of linguistics and phonetics , 1997 .

[20]  D. Mitchell Wilkes,et al.  Investigation of vocal jitter and glottal flow spectrum as possible cues for depression and near-term suicidal risk , 2004, IEEE Transactions on Biomedical Engineering.

[21]  R. M. Dauer Phonetic and Phonological Components of Language Rhythm , 1987 .

[22]  François Pellegrino,et al.  An unsupervised approach to language identification , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[23]  Marc Schröder,et al.  Emotional speech synthesis: a review , 2001, INTERSPEECH.

[24]  Esther Klabbers,et al.  Clustering of foot-based pitch contours in expressive speech , 2004, SSW.

[25]  Gary Weismer,et al.  The effect of intertalker speech rate variation on acoustic vowel space. , 2006, The Journal of the Acoustical Society of America.

[26]  Thierry Dutoit,et al.  HNR EXTRACTION IN VOICED SPEECH, ORIENTED TOWARDS VOICE QUALITY ANALYSIS , 2005 .

[27]  K. Scherer,et al.  Acoustic profiles in vocal emotion expression. , 1996, Journal of personality and social psychology.

[28]  Jérôme Farinas,et al.  Automatic rhythm modeling for language identification , 2001, INTERSPEECH.

[29]  Arjeh M. Cohen,et al.  Synchronized Multimedia Integration Language (SMIL) 2.0 , 1998 .

[30]  C. Cullen,et al.  LinguaTag: an Emotional Speech Analysis Application , 2008 .

[31]  Paul Boersma,et al.  Praat, a system for doing phonetics by computer , 2002 .

[32]  Roddy Cowie,et al.  Acoustic correlates of emotion dimensions in view of speech synthesis , 2001, INTERSPEECH.

[33]  Hansjörg Mixdorff,et al.  Speech Technology, ToBI, and Making Sense of Prosody , 2002 .

[34]  C. Cullen,et al.  Task-Based Mood Induction Procedures for the Elicitation of Natural Emotional Responses. , 2007 .