Classification of sound clips by two schemes: Using onomatopoeia and semantic labels

Using the recently proposed framework for latent perceptual indexing of audio clips, we present classification of whole clips categorized by two schemes: high-level semantic labels and the mid-level perceptually motivated onomatopoeia labels. First, feature-vectors extracted from the clips in the database are grouped into reference clusters using an unsupervised clustering technique. A unit-document co-occurrence matrix is then obtained by quantizing the feature-vectors extracted from the audio clips into the reference clusters. The audio clips are then mapped to a latent perceptual space by the reduced rank approximation of this matrix. The classification experiments are performed in this representation space using corresponding semantic and onomatopoeic labels of the clips. Using the proposed method, classification accuracy of about sixty percent was obtained when tested on the BBC sound effects library using over twenty categories. Having the two labeling schemes together in a single framework makes the classification system more flexible as each scheme addresses the limitation of the other. These aspects are the main motivation of the work presented here.

[1]  Markus Koppenberger,et al.  Nearest-neighbor Generic Sound Classification with a WordNet-based Taxonomy , 2004 .

[2]  J.R. Bellegarda,et al.  Latent semantic mapping [information retrieval] , 2005, IEEE Signal Processing Magazine.

[3]  Shrikanth S. Narayanan,et al.  Analysis of Audio Clustering using Word Descriptions , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[4]  Guodong Guo,et al.  Content-based audio classification and retrieval by support vector machines , 2003, IEEE Trans. Neural Networks.

[5]  Malcolm Slaney,et al.  Construction and evaluation of a robust multifeature speech/music discriminator , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  David G. Stork,et al.  Pattern classification, 2nd Edition , 2000 .

[7]  Hugh T Bredin Onomatopoeia as a Figure and a Linguistic Principle , 1996 .

[8]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[9]  R. Bellegarda,et al.  Latent Semantic Mapping [ A data-driven framework for modeling global relationships implicit in large volumes of data ] , 2000 .

[10]  Lie Lu,et al.  Content analysis for audio classification and segmentation , 2002, IEEE Trans. Speech Audio Process..

[11]  Antoni B. Chan,et al.  Audio Information Retrieval using Semantic Similarity , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[12]  Chih-Jen Lin,et al.  A comparison of methods for multiclass support vector machines , 2002, IEEE Trans. Neural Networks.

[13]  Malcolm Slaney,et al.  Semantic-audio retrieval , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[14]  Douglas Keislar,et al.  Content-Based Classification, Search, and Retrieval of Audio , 1996, IEEE Multim..

[15]  C.-C. Jay Kuo,et al.  Audio content analysis for online audiovisual data segmentation and classification , 2001, IEEE Trans. Speech Audio Process..

[16]  David G. Stork,et al.  Pattern Classification , 1973 .

[17]  Shrikanth S. Narayanan,et al.  Audio retrieval by latent perceptual indexing , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[18]  Gaël Richard,et al.  Drum Loops Retrieval from Spoken Queries , 2005, Journal of Intelligent Information Systems.

[19]  Ishwar K. Sethi,et al.  Classification of general audio data for content-based retrieval , 2001, Pattern Recognit. Lett..

[20]  Georgios Tziritas,et al.  A speech/music discriminator based on RMS and zero-crossings , 2005, IEEE Transactions on Multimedia.