Investigating glottal parameters for differentiating emotional categories with similar prosodics

Speech prosodics (i.e., pitch, energy, etc.) play an important role in the interpretation of emotional expression. However, certain pairs of emotions can be difficult to discriminate due to similar displayed tendencies in prosodic statistics. The purpose of this paper is to target speaker dependent expressions of emotional pairs that share statistically similar prosodic information and investigate a set of glottal features for their ability to find measurable differences in these expressions. Evaluation is based on acted emotional utterances from the Emotional Prosody and Speech Transcript (EPST) database. While it is in no way assumed that acted speech provides a complete picture of authentic emotion, the value of this information is that the actors adjusted their voice quality to fit their perception of different emotions. Results show statistically significant differences (p ≪ 0.05) in at least one glottal feature for all 30 emotion pairs where prosodic features did not show a significant difference. In addition, the use of single glottal features reduced classification error for 24 emotion pairs in comparison to pitch or energy.