UvA-DARE ( Digital Academic Repository ) The MediaMill TRECVID 2008 semantic video search engine

In this paper we describe our TRECVID 2008 video retrieval experiments. The MediaMill team participated in three tasks: concept detection, automatic search, and interactive search. Rather than continuing to increase the number of concept detectors available for retrieval, our TRECVID 2008 experiments focus on increasing the robustness of a small set of concept detectors. To that end our concept detection experiments emphasize in particular the role of sampling, the value of color invariant features, the influence of codebook construction, and the effectiveness of kernel-based learning parameters. A robust but limited set of concept detectors necessitates the need to rely on as many auxiliary information channels as possible for retrieval. Therefore, our automatic search experiments focus on predicting which information channel to trust given a certain topic, leading to a novel framework for predictive video retrieval. To improve the video retrieval results further, our interactive search experiments investigate the roles of visualizing preview results for a certain browse-dimension and active learning mechanisms that learn to solve complex search topics by analysis from user browsing behavior. The 2008 edition of the TRECVID benchmark has been the most successful MediaMill participation to date, resulting in the top ranking for both concept detection and interactive search, and a runnerup ranking for automatic retrieval. Again a lot has been learned during this year’s TRECVID campaign, we highlight the most important lessons at the end of this paper.

[1]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[2]  Wilson S. Geisler,et al.  Multichannel Texture Analysis Using Localized Spatial Filters , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[4]  Anil K. Jain,et al.  Statistical Pattern Recognition: A Review , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[6]  P. Bartlett,et al.  Probabilities for SV Machines , 2000 .

[7]  Arnold W. M. Smeulders,et al.  Color Invariance , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[9]  Jitendra Malik,et al.  Representing and Recognizing the Visual Appearance of Materials using Three-dimensional Textons , 2001, International Journal of Computer Vision.

[10]  Christian Petersohn Fraunhofer HHI at TRECVID 2004: Shot Boundary Detection System , 2004, TRECVID.

[11]  Milind R. Naphade On supervision and statistical learning for semantic multimedia analysis , 2004, J. Vis. Commun. Image Represent..

[12]  Leonidas J. Guibas,et al.  The Earth Mover's Distance as a Metric for Image Retrieval , 2000, International Journal of Computer Vision.

[13]  Milind R. Naphade,et al.  Learning the semantics of multimedia queries and concepts from a small number of examples , 2005, MULTIMEDIA '05.

[14]  Christopher D. Manning,et al.  Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling , 2005, ACL.

[15]  Frédéric Jurie,et al.  Creating efficient codebooks for visual recognition , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[16]  Marcel Worring,et al.  On the surplus value of semantic video analysis beyond the key frame , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[17]  Arnold W. M. Smeulders,et al.  c ○ 2005 Springer Science + Business Media, Inc. Manufactured in The Netherlands. A Six-Stimulus Theory for Stochastic Texture , 2002 .

[18]  Paul Over,et al.  Evaluation campaigns and TRECVid , 2006, MIR '06.

[19]  Jan-Mark Geusebroek,et al.  Compact Object Descriptors from Local Colour Invariant Histograms , 2006, BMVC.

[20]  Marcel Worring,et al.  The Semantic Pathfinder: Using an Authoring Metaphor for Generic Multimedia Indexing , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[22]  Dennis Koelma,et al.  The MediaMill TRECVID 2008 Semantic Video Search Engine , 2008, TRECVID.

[23]  Gunnar Rätsch,et al.  Large Scale Multiple Kernel Learning , 2006, J. Mach. Learn. Res..

[24]  Cordelia Schmid,et al.  Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[25]  Cor J. Veenman,et al.  The influence of cross-validation on video classification performance , 2006, MM '06.

[26]  Marcel Worring,et al.  Query on demand video browsing , 2007, ACM Multimedia.

[27]  Marcel Worring,et al.  A Learned Lexicon-Driven Paradigm for Interactive Video Retrieval , 2007, IEEE Transactions on Multimedia.

[28]  Dong Wang,et al.  Video diver: generic video indexing with diverse features , 2007, MIR '07.

[29]  Sheng Tang,et al.  TRECVID 2007 High-Level Feature Extraction By MCG-ICT-CAS , 2007, TRECVID.

[30]  Franciska de Jong,et al.  Annotation of Heterogeneous Multimedia Content Using Automatic Speech Recognition , 2007, SAMT.

[31]  Hsuan-Tien Lin,et al.  A note on Platt’s probabilistic outputs for support vector machines , 2007, Machine Learning.

[32]  Maarten de Rijke,et al.  Exploiting redundancy in cross-channel video retrieval , 2007, MIR '07.

[33]  Cordelia Schmid,et al.  Learning Object Representations for Visual Object Class Recognition , 2007, ICCV 2007.

[34]  Marcel Worring,et al.  Balancing thread based navigation for targeted video search , 2008, CIVR '08.

[35]  Apostol Natsev,et al.  Web-based information content and its application to concept-based video retrieval , 2008, CIVR '08.

[36]  Stéphane Ayache,et al.  Video Corpus Annotation Using Active Learning , 2008, ECIR.

[37]  Subhransu Maji,et al.  Classification using intersection kernel support vector machines is efficient , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[38]  Koen E. A. van de Sande,et al.  Evaluation of color descriptors for object and scene recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[39]  Andrew Zisserman,et al.  Efficient Visual Search for Objects in Videos , 2008, Proceedings of the IEEE.

[40]  Cor J. Veenman,et al.  Kernel Codebooks for Scene Categorization , 2008, ECCV.

[41]  Katja Hofmann,et al.  Assessing concept selection for video retrieval , 2008, MIR '08.

[42]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[43]  Gertjan J. Burghouts,et al.  Performance evaluation of local colour invariants , 2009, Comput. Vis. Image Underst..

[44]  Marcel Worring,et al.  Concept-Based Video Retrieval , 2009, Found. Trends Inf. Retr..

[45]  John R. Smith,et al.  IBM Research TRECVID-2009 Video Retrieval System , 2009, TRECVID.

[46]  Cor J. Veenman,et al.  Episode-Constrained Cross-Validation in Video Concept Retrieval , 2009, IEEE Transactions on Multimedia.

[47]  Arnold W. M. Smeulders,et al.  Real-Time Visual Concept Classification , 2010, IEEE Transactions on Multimedia.

[48]  Koen E. A. van de Sande,et al.  Evaluating Color Descriptors for Object and Scene Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[49]  Arnold W. M. Smeulders,et al.  Visual-Concept Search Solved? , 2010, Computer.

[50]  Cor J. Veenman,et al.  Comparing compact codebooks for visual categorization , 2010, Comput. Vis. Image Underst..

[51]  Thomas Mensink,et al.  Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.

[52]  Thomas S. Huang,et al.  Image Classification Using Super-Vector Coding of Local Image Descriptors , 2010, ECCV.

[53]  Cor J. Veenman,et al.  Visual Word Ambiguity , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.