Clustering audio clips by context-free description and affective ratings

In the absence of context, the process of listening to acoustic scenes results in deriving an explicit semantic description and an implicit assessment of its acoustic properties in terms of its affective value. In this work, we mainly exploit the relationship between context-free associations of audio clips containing unconstrained acoustic sources with their affective values for clustering. Using over two hundred clips from the BBC sound effects library, we present a novel, quantitative method to compare the clusters of audio clips obtained using its context-free description with the clusters obtained from their affective measures; namely valence, arousal and dominance. Our results indicate that comparing clusters across representations is a suitable approach to determine an appropriate number of clusters to index audio clips in an un-supervised manner. In this paper we present our findings and examples of the resulting clusters of audio clips.

[1]  P. Lang International affective picture system (IAPS) : affective ratings of pictures and instruction manual , 2005 .

[2]  David G. Stork,et al.  Pattern classification, 2nd Edition , 2000 .

[3]  James J. Gross,et al.  Insights into emotion regulation from neuropsychology , 2007 .

[4]  Yi-Hsuan Yang,et al.  A Regression Approach to Music Emotion Recognition , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[6]  T. James,et al.  Affective auditory stimuli: Characterization of the International Affective Digitized Sounds (IADS) by discrete emotional categories , 2008, Behavior research methods.

[7]  Malcolm Slaney,et al.  Semantic-audio retrieval , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  Jerome R. Bellegarda,et al.  Latent Semantic Mapping: Principles And Applications (Synthesis Lectures on Speech and Audio Processing) , 2006 .

[9]  E. Vesterinen,et al.  Affective Computing , 2009, Encyclopedia of Biometrics.

[10]  William W. Gaver Auditory Icons: Using Sound in Computer Interfaces , 1986, Hum. Comput. Interact..

[11]  Shrikanth S. Narayanan,et al.  Classification of sound clips by two schemes: Using onomatopoeia and semantic labels , 2008, 2008 IEEE International Conference on Multimedia and Expo.

[12]  Antoni B. Chan,et al.  Audio Information Retrieval using Semantic Similarity , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[13]  Markus Koppenberger,et al.  Nearest-neighbor Generic Sound Classification with a WordNet-based Taxonomy , 2004 .

[14]  Russell L. Martin,et al.  Learning and Retention of Associations Between Auditory Icons and Denotative Referents: Implications for the Design of Auditory Warnings , 2006, Hum. Factors.

[15]  M. Bradley,et al.  Measuring emotion: the Self-Assessment Manikin and the Semantic Differential. , 1994, Journal of behavior therapy and experimental psychiatry.