Discovering dimensions of perceived vocal expression in semi-structured, unscripted oral history accounts

What do people hear in expressive, unprompted speech? And how can their descriptions be transformed into a representative set of dimensions of vocal expression? This paper presents a methodology for collecting user description of vocal expression, transforms the user descriptions into a set of measurable expressive dimensions, and derives a representative feature set and baseline classifiers across these dimensions. The resulting classifiers recognized the top 13 dimensions over an oral history corpus, with a maximum unweighted recall score of 80.5%

[1]  Guozhen An,et al.  Detecting laughter and filled pauses using syllable-based features , 2013, INTERSPEECH.

[2]  Cara G Smith,et al.  Resonant voice: spectral and nasendoscopic analysis. , 2005, Journal of voice : official journal of the Voice Foundation.

[3]  Tim Polzehl,et al.  Anger recognition in speech using acoustic and linguistic cues , 2011, Speech Commun..

[4]  Rafael A. Calvo,et al.  Beyond the basic emotions: what should affective computing compute? , 2013, CHI Extended Abstracts.

[5]  K. Scherer,et al.  THE EFFECTS OF EMOTIONS ON VOICE QUALITY , 1999 .

[6]  Björn W. Schuller,et al.  Recognising realistic emotions and affect in speech: State of the art and lessons learnt from the first challenge , 2011, Speech Commun..

[7]  Mikko Kurimo,et al.  Analysis of breathy, modal and pressed phonation based on low frequency spectral density , 2013, INTERSPEECH.

[8]  John Kane,et al.  Data-driven detection and analysis of the patterns of creaky voice , 2014, Comput. Speech Lang..

[9]  Louis-Philippe Morency,et al.  Investigating voice quality as a speaker-independent indicator of depression and PTSD , 2013, INTERSPEECH.

[10]  References , 1971 .

[11]  John H. L. Hansen,et al.  Laughter and filler detection in naturalistic audio , 2015, INTERSPEECH.

[12]  Ailbhe Ní Chasaide,et al.  The role of voice quality in communicating emotion, mood and attitude , 2003, Speech Commun..

[13]  Peter W. Foltz,et al.  An introduction to latent semantic analysis , 1998 .

[14]  Björn W. Schuller,et al.  A multitask approach to continuous five-dimensional affect sensing in natural speech , 2012, TIIS.

[15]  Carlos Busso,et al.  Emotion recognition using a hierarchical binary decision tree approach , 2011, Speech Commun..

[16]  Elif Bozkurt,et al.  Exploring modulation spectrum features for speech-based depression level classification , 2014, INTERSPEECH.

[17]  Carlos Busso,et al.  Analysis of Emotionally Salient Aspects of Fundamental Frequency for Emotion Detection , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[18]  Bayya Yegnanarayana,et al.  Excitation source features for discrimination of anger and happy emotions , 2014, INTERSPEECH.

[19]  Shashidhar G. Koolagudi,et al.  Spectral Features for Emotion Classification , 2009, 2009 IEEE International Advance Computing Conference.

[20]  John H. L. Hansen,et al.  Whisper speech processing: analysis, modeling, and detection with applications to keyword spotting , 2012 .

[21]  K. Sreenivasa Rao,et al.  Automatic detection of creaky voice using epoch parameters , 2015, INTERSPEECH.

[22]  Hiroshi Ishiguro,et al.  A Method for Automatic Detection of Vocal Fry , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[23]  J. Hillenbrand,et al.  Acoustic correlates of breathy vocal quality: dysphonic voices and continuous speech. , 1996, Journal of speech and hearing research.

[24]  Vidhyasaharan Sethu,et al.  Probabilistic acoustic volume analysis for speech affected by depression , 2014, INTERSPEECH.

[25]  Elmar Nöth,et al.  Automatic modelling of depressed speech: relevant features and relevance of gender , 2014, INTERSPEECH.

[26]  Paavo Alku,et al.  Comparison of multiple voice source parameters in different phonation types , 2007, INTERSPEECH.

[27]  A. Ortony,et al.  What's basic about basic emotions? , 1990, Psychological review.

[28]  Allard Jongman,et al.  Acoustic correlates of breathy and clear vowels: the case of Khmer , 2003, J. Phonetics.

[29]  Amy Beth Warriner,et al.  Norms of valence, arousal, and dominance for 13,915 English lemmas , 2013, Behavior Research Methods.

[30]  Carlos Busso,et al.  Unveiling the Acoustic Properties that Describe the Valence Dimension , 2012, INTERSPEECH.

[31]  Karrie Karahalios,et al.  Acoustic correlates for perceived effort levels in expressive speech , 2015, INTERSPEECH.

[32]  Thomas Drugman,et al.  Creaky voice and the classification of affect , 2013 .

[33]  Donna Erickson,et al.  Exploratory Study of Some Acoustic and Articulatory Characteristics of Sad Speech , 2006, Phonetica.