University of Amsterdam at the Visual Concept Detection and Annotation Tasks

Visual concept detection is important to access visual information on the level of objects and scene types. The current state–of–the–art in visual concept detection and annotation tasks is based on the bag–of–words model. Within the bag–of–words model, points are first sampled according to some strategy, then the area around these points are described using color descriptors. These descriptors are then vector–quantized against a codebook of prototypical descriptors, which results in a fixed–length representation of the image. Based on these representations, visual concept models are trained. In this chapter, we discuss the design choices within the bag–of–words model and their implications for concept detection accuracy.

[1]  Frédéric Jurie,et al.  Creating efficient codebooks for visual recognition , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[2]  Dong Wang,et al.  Video diver: generic video indexing with diverse features , 2007, MIR '07.

[3]  Koen E. A. van de Sande,et al.  A comparison of color features for visual concept classification , 2008, CIVR '08.

[4]  Cordelia Schmid,et al.  Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[5]  Stefanie Nowak,et al.  Overview of the CLEF 2009 Large Scale - Visual Concept Detection and Annotation Task , 2009, CLEF.

[6]  Marcel Worring,et al.  Concept-Based Video Retrieval , 2009, Found. Trends Inf. Retr..

[7]  Koen E. A. van de Sande,et al.  Evaluating Color Descriptors for Object and Scene Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[9]  Cordelia Schmid,et al.  Learning Object Representations for Visual Object Class Recognition , 2007, ICCV 2007.

[10]  Arnold W. M. Smeulders,et al.  Color Invariance , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Hsuan-Tien Lin,et al.  A note on Platt’s probabilistic outputs for support vector machines , 2007, Machine Learning.

[12]  Cordelia Schmid,et al.  Semantic Hierarchies for Visual Object Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[14]  Gertjan J. Burghouts,et al.  Performance evaluation of local colour invariants , 2009, Comput. Vis. Image Underst..

[15]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[16]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[17]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[18]  Cor J. Veenman,et al.  Visual Word Ambiguity , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Vladimir Naumovich Vapni The Nature of Statistical Learning Theory , 1995 .

[20]  Jitendra Malik,et al.  Representing and Recognizing the Visual Appearance of Materials using Three-dimensional Textons , 2001, International Journal of Computer Vision.

[21]  Arnold W. M. Smeulders,et al.  Real-time bag of words, approximately , 2009, CIVR '09.

[22]  Dennis Koelma,et al.  The MediaMill TRECVID 2008 Semantic Video Search Engine , 2008, TRECVID.

[23]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.