Automatic recognition of emotion evoked by general sound events

Without a doubt there is emotion in sound. So far, however, research efforts have focused on emotion in speech and music despite many applications in emotion-sensitive sound retrieval. This paper is an attempt at automatic emotion recognition of general sounds. We selected sound clips from different areas of the daily human environment and model them using the increasingly popular dimensional approach in the emotional arousal and valence space. To establish a reliable ground truth, we compare mean and median of four annotators with their evaluator weighted estimator. We discuss human labelers' consistency, feature relevance, and automatic regression. Results reach correlation coefficients of .61 (arousal) and .49 (valence).