论文信息 - Bag-of-multimedia-words for image classification

Bag-of-multimedia-words for image classification

We introduce the bag-of-multimedia-words model that tightly combines the heterogeneous information coming from the text and the pixel-based information of a multimedia document. The proposed multimedia feature generation process is generic for any multi-modality and aims at enriching a multimedia document description with compact and discriminative signatures well appropriate to linear classifiers. It is evaluated on the Pascal VOC 2007 classification challenge, outperforming the state-of-the-art bag-of-visual-words or bag-of-tag-words based classification approaches.

[1] G LoweDavid,et al. Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[2] Chong-Wah Ngo,et al. Keyframe Retrieval by Keypoints: Can Point-to-Point Matching Help? , 2006, CIVR.

[3] Yihong Gong,et al. Linear spatial pyramid matching using sparse coding for image classification , 2009, CVPR.

[4] Cordelia Schmid,et al. Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[5] Lei Wang,et al. In defense of soft-assignment coding , 2011, 2011 International Conference on Computer Vision.

[6] David H. Wolpert,et al. Stacked generalization , 1992, Neural Networks.

[7] Derek Hoiem,et al. Building text features for object image classification , 2009, CVPR.

[8] Yihong Gong,et al. Nonlinear Learning using Local Coordinate Coding , 2009, NIPS.

[9] Michael McGill,et al. Introduction to Modern Information Retrieval , 1983 .

[10] Cor J. Veenman,et al. Visual Word Ambiguity , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11] Daniel Gatica-Perez,et al. PLSA-based image auto-annotation: constraining the latent space , 2004, MULTIMEDIA '04.

[12] Giovanni Maria Farinella,et al. Exploiting Textons Distributions on Spatial Hierarchy for Scene Classification , 2010, EURASIP J. Image Video Process..

[13] Yihong Gong,et al. Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[14] Cordelia Schmid,et al. Multimodal semi-supervised learning for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[15] Shih-Fu Chang,et al. To search or to label?: predicting the performance of search-based automatic image classifiers , 2006, MIR '06.

[17] Andrew Zisserman,et al. Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[18] Jean Ponce,et al. Learning mid-level features for recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[19] Marcel Worring,et al. Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[20] Roelof van Zwol,et al. Flickr tag recommendation based on collective knowledge , 2008, WWW.

[21] Tieniu Tan,et al. Salient coding for image classification , 2011, CVPR 2011.