SemanGist: A Local Semantic Image Representation

Although various kinds of image features have been proposed, there exists no single optimal feature which can save the effort of all other features for multimedia analysis applications, e.g. image annotation. In this paper, we propose a novel image representation, Semantic Gist (SemanGist), to combine the merit of multiple features automatically. Given a local image patch, SemanGist converts multiple low-level features of the patch into compact prediction scores of a few predefined semantic categories. To this end, a discriminative multi-label boosting algorithm is adopted. This local SemanGist output allows for incorporating semantic spatial context among adjacent patches. For applications like image annotation, this may further reduce possible annotation errors by considering the label compatibility. The same boosting algorithm is applied to the SemanGist representation, together with low-level features, to ensure the label compatibility. Experiments on an image annotation task show that SemanGist not only achieves compact representation but also incorporates spatial context at low run-time computational cost.

[1]  B. S. Manjunath,et al.  Unsupervised Segmentation of Color-Texture Regions in Images and Video , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[3]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[4]  Marcel Worring,et al.  The Semantic Pathfinder: Using an Authoring Metaphor for Generic Multimedia Indexing , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  John R. Smith,et al.  IBM Research TRECVID-2009 Video Retrieval System , 2009, TRECVID.

[6]  Dong Wang,et al.  Video diver: generic video indexing with diverse features , 2007, MIR '07.

[7]  Cor J. Veenman,et al.  Robust Scene Categorization by Learning Image Statistics in Context , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[8]  John A. Leese,et al.  The determination of cloud pattern motions from geosynchronous satellite image data , 1970, Pattern Recognit..

[9]  Antonio Torralba,et al.  Sharing features: efficient boosting procedures for multiclass object detection , 2004, CVPR 2004.

[10]  Bo Zhang,et al.  Exploiting spatial context constraints for automatic image region annotation , 2007, ACM Multimedia.

[11]  Shih-Fu Chang,et al.  Context-Based Concept Fusion with Boosted Conditional Random Fields , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[12]  Paul Over,et al.  Evaluation campaigns and TRECVid , 2006, MIR '06.

[13]  Rong Yan,et al.  Model-shared subspace boosting for multi-label classification , 2007, KDD '07.

[14]  Thomas Hofmann,et al.  Discriminative Learning for Label Sequences via Boosting , 2002, NIPS.

[15]  Josef Kittler,et al.  Floating search methods for feature selection with nonmonotonic criterion functions , 1994, Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 3 - Conference C: Signal Processing (Cat. No.94CH3440-5).

[16]  Inderjit S. Dhillon,et al.  Information-theoretic metric learning , 2006, ICML '07.