Audio contributions to semantic video search

This paper summarizes the contributions to semantic video search that can be derived from the audio signal. Because of space restrictions, the emphasis will be on non-linguistic cues. The paper thus covers what is generally known as audio segmentation, as well as audio event detection. Using machine learning approaches, we have built detectors for over 50 semantic audio concepts.

[1]  Svetha Venkatesh,et al.  Detecting indexical signs in film audio for scene interpretation , 2001, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001..

[2]  Derek Hoiem,et al.  SOLAR: sound object localization and retrieval in complex audio environments , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[3]  João Paulo da Silva Neto,et al.  Training audio events detectors with a sound effects corpus , 2008, INTERSPEECH.

[4]  Xavier Rodet,et al.  HIERARCHICAL GAUSSIAN TREE WITH INERTIA RATIO MAXIMIZATION FOR THE CLASSIFICATION OF LARGE MUSICAL INSTRUMENT DATABASES , 2003 .

[5]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[6]  Lie Lu,et al.  A flexible framework for key audio effects detection and auditory context inference , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  Jane Yung-jen Hsu,et al.  A study of semantic context detection by using SVM and GMM approaches , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).