论文信息 - Semi-automatic audio semantic concept discovery for multimedia retrieval

Semi-automatic audio semantic concept discovery for multimedia retrieval

Huge amount of videos on the Internet have rare textual information, which makes video retrieval challenging given a text query. Previous work explored semantic concepts for content analysis to assist retrieval. However, the human-defined concepts might fail to cover the data and there is a potential gap between these concepts and the semantics expected from user's query. Also, building a corpus is expensive and time-consuming. To address these issues, we propose a semi-automatic framework to discover the semantic concepts. We limit ourselves in audio modality here. In the paper, we also discuss how to select meaningful vocabulary from the discovered hierarchical sub-categories and provide an approach to detect all the concepts without further annotation. We evaluate the method on NIST 2011 multimedia event detection (MED) dataset.

Florian Metze | Yipei Wang | Shourabh Rawat

[1] Florian Metze,et al. Noisemes: Manual Annotation of Environmental Noise in Audio Streams , 2012 .

[2] Leo Breiman,et al. Random Forests , 2001, Machine Learning.

[3] Michael I. Jordan,et al. Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[4] Daniel P. W. Ellis,et al. Audio-Based Semantic Concept Classification for Consumer Video , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[5] Florian Metze,et al. Event-based Video Retrieval Using Audio , 2012, INTERSPEECH.

[6] Jiebo Luo,et al. Large-scale multimodal semantic concept detection for consumer video , 2007, MIR '07.

[7] Bhiksha Raj,et al. Unsupervised hierarchical structure induction for deeper semantic analysis of audio , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[8] Florian Metze,et al. Beyond audio and video retrieval: towards multimedia summarization , 2012, ICMR.

[9] Bhiksha Raj,et al. Unsupervised Learning of Acoustic Unit Descriptors for Audio Content Representation and Classification , 2011, INTERSPEECH.

[10] Bhiksha Raj,et al. Unsupervised Structure Discovery for Semantic Analysis of Audio , 2012, NIPS.