论文信息 - How to put it into words - using random forests to extract symbol level descriptions from audio content for concept detection

How to put it into words - using random forests to extract symbol level descriptions from audio content for concept detection

This paper presents a system that uses symbolic representations of audio concepts as words for the descriptions of audio tracks, that enable it to go beyond the state of the art, which is audio event classification of a small number of audio classes in constrained settings, to large-scale classification in the wild. These audio words might be less meaningful for an annotator but they are descriptive for computer algorithms. We devise a random-forest vocabulary learning method with an audio word weighting scheme based on TF-IDF and TD-IDD, so as to combine the computational simplicity and accurate multi-class classification of the random forest with the data-driven discriminative power of the TF-IDF/TD-IDD methods. The proposed random forest clustering with text-retrieval methods significantly outperforms two state-of-the-art methods on the dry-run set and the full set of the TRECVID MED 2010 dataset.

Mark Hasegawa-Johnson | Gerald Friedland | Ajay Divakaran | Po-Sen Huang | Robert Mertens

[1] Chong-Wah Ngo,et al. Towards optimal bag-of-features for object categorization and semantic video retrieval , 2007, CIVR '07.

[2] Takeo Kanade,et al. Intelligent Access to Digital Video: Informedia Project , 1996, Computer.

[3] Jörg Kindermann,et al. Text Categorization with Support Vector Machines. How to Represent Texts in Input Space? , 2002, Machine Learning.

[4] KanadeTakeo,et al. Intelligent Access to Digital Video , 1996 .

[5] Mubarak Shah,et al. Columbia-UCF TRECVID2010 Multimedia Event Detection: Combining Multiple Modalities, Contextual Concepts, and Temporal Matching , 2010, TRECVID.

[6] Zhu Liu,et al. Integration of multimodal features for video scene classification based on HMM , 1999, 1999 IEEE Third Workshop on Multimedia Signal Processing (Cat. No.99TH8451).

[7] S. Horvath,et al. Unsupervised Learning With Random Forest Predictors , 2006 .

[8] Lie Lu,et al. Audio Keywords Discovery for Text-Like Audio Content Analysis and Retrieval , 2008, IEEE Transactions on Multimedia.

[9] Leo Breiman,et al. Random Forests , 2001, Machine Learning.

[10] Marcel Worring,et al. Concept-Based Video Retrieval , 2009, Found. Trends Inf. Retr..

[11] Gerald Friedland,et al. Acoustic super models for large scale video event detection , 2011, J-MRE '11.

[12] Bhiksha Raj,et al. Unsupervised Learning of Acoustic Unit Descriptors for Audio Content Representation and Classification , 2011, INTERSPEECH.