On the Perception of Affect in the Singing Voice: A Study of Acoustic Cues

This study addresses the perception of affect in vocal and glottal recordings of a singing voice. An experiment was made to rate the samples on four broad affect terms describing the two-dimensional model of emotion. A cross-tabulation between the singing expressions and affect scores revealed their relationship with affect dimensions. Prosodic as well as spectral acoustic cues were extracted and statistical analysis performed on 22 features revealed a set of cues whose means are statistically significant with respect to valence and arousal, namely SPR, F5, B1, B4, mean pitch, mean intensity, brightness, jitter, shimmer, mean autocorrelation, mean HNR, mean LTAS, RMS, SPL, LPH, and LTAS slope. Principal component analysis was made for vocal and glottal features: 2 components explained 78.1 % and 73.5 % of the original variance of prosodic cues, and 2 components explained 86.3 % and 86.7 % of the original variance of prosodic and spectral cues.

[1]  J. Sloboda,et al.  Psychological perspectives on music and emotion , 2001 .

[2]  平野 実,et al.  Vocal fold physiology : voice quality control , 1995 .

[3]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[4]  Eva Björkner,et al.  Interdependencies among Voice Source Parameters in Emotional Speech , 2011, IEEE Transactions on Affective Computing.

[5]  J. Sloboda,et al.  Music and emotion: Theory and research , 2001 .

[6]  Paul Boersma,et al.  Praat, a system for doing phonetics by computer , 2002 .

[7]  J. Estis,et al.  The singing power ratio as an objective measure of singing voice quality in untrained talented and nontalented singers. , 2006, Journal of voice : official journal of the Voice Foundation.

[8]  D S Lundy,et al.  Acoustic analysis of the singing and speaking voice in singing students. , 2000, Journal of voice : official journal of the Voice Foundation.

[9]  A. Grob,et al.  Dimensional models of core affect: a quantitative comparison by means of structural equation modeling , 2000 .

[10]  K. Scherer,et al.  Emotion Inferences from Vocal Expression Correlate Across Languages and Cultures , 2001 .

[11]  Anne Lacheret,et al.  The role of voice quality and prosodic contour in affective speech perception , 2012, Speech Commun..

[12]  E. Lin,et al.  The Singing Power Ratio and Timbre-Related Acoustic Analysis of Singing Vowels and Musical Instruments , 2009 .

[13]  Petri Toiviainen,et al.  MIR in Matlab (II): A Toolbox for Musical Feature Extraction from Audio , 2007, ISMIR.

[14]  P. Boersma,et al.  Spectral characteristics of three styles of Croatian folk singing. , 2006, The Journal of the Acoustical Society of America.

[15]  K. Scherer,et al.  Mapping emotions into acoustic space: The role of voice production , 2011, Biological Psychology.

[16]  K. Omori,et al.  Singing power ratio: quantitative evaluation of singing voice quality. , 1996, Journal of voice : official journal of the Voice Foundation.

[17]  Gerrit Bloothooft,et al.  Perception and Acoustics of Emotions in Singing , 1997, EUROSPEECH.

[18]  Jeff Pittam,et al.  The long-term spectrum and perceived emotion , 1990, Speech Commun..

[19]  K. Scherer Expression of emotion in voice and music. , 1995, Journal of voice : official journal of the Voice Foundation.

[20]  J. Russell A circumplex model of affect. , 1980 .

[21]  Trevor J. Cox,et al.  Tutorial : public engagement through audio internet experiments , 2011 .

[22]  Rajneet Kaur,et al.  A Study of Speech Emotion Recognition Methods , 2013 .

[23]  Nick Campbell,et al.  Analysis of acoustic-prosodic features of spontaneous expressive speech , 2004 .