The MediaMill TRECVID 2008 Semantic Video Search Engine
Abstract:In this paper we describe our TRECVID 2008 video retrieval experiments. The MediaMill team participated in three tasks: concept detection, automatic search, and interac- tive search. Rather than continuing to increase the number of concept detectors available for retrieval, our TRECVID 2008 experiments focus on increasing the robustness of a small set of detectors using a bag-of-words approach. To that end, our concept detection experiments emphasize in particular the role of visual sampling, the value of color in- variant features, the influence of codebook construction, and the effectiveness of kernel-based learning parameters. For retrieval, a robust but limited set of concept detectors ne- cessitates the need to rely on as many auxiliary information channels as possible. Therefore, our automatic search ex- periments focus on predicting which information channel to trust given a certain topic, leading to a novel framework for predictive video retrieval. To improve the video retrieval re- sults further, our interactive search experiments investigate the roles of visualizing preview results for a certain browse- dimension and active learning mechanisms that learn to solve complex search topics by analysis from user brows- ing behavior. The 2008 edition of the TRECVID bench- mark has been the most successful MediaMill participation to date, resulting in the top ranking for both concept de- tection and interactive search, and a runner-up ranking for automatic retrieval. Again a lot has been learned during this year’s TRECVID campaign; we highlight the most im- portant lessons at the end of this paper.
暂无分享,去 创建一个
[1] G. G. Stokes. "J." , 1890, The New Yale Book of Quotations.
[2] R. Fisher. THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .
[3] H. Kuhn. The Hungarian method for the assignment problem , 1955 .
[4] M M Astrahan. SPEECH ANALYSIS BY CLUSTERING, OR THE HYPERPHONEME METHOD , 1970 .
[5] Gerard Salton,et al. The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .
[6] Keinosuke Fukunaga,et al. Introduction to Statistical Pattern Recognition , 1972 .
[7] Martin F. Porter,et al. An algorithm for suffix stripping , 1997, Program.
[8] Michael McGill,et al. Introduction to Modern Information Retrieval , 1983 .
[9] Gerard Salton,et al. Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..
[10] Gerard Salton,et al. Improving retrieval performance by relevance feedback , 1997, J. Am. Soc. Inf. Sci..
[11] Wilson S. Geisler,et al. Multichannel Texture Analysis Using Localized Spatial Filters , 1990, IEEE Trans. Pattern Anal. Mach. Intell..
[12] T. Landauer,et al. Indexing by Latent Semantic Analysis , 1990 .
[13] George A. Miller,et al. WordNet: A Lexical Database for English , 1995, HLT.
[14] Edward A. Fox,et al. Combination of Multiple Searches , 1993, TREC.
[15] Yukinobu Taniguchi,et al. Structured Video Computing , 1994, IEEE MultiMedia.
[16] Helmut Schmidt,et al. Probabilistic part-of-speech tagging using decision trees , 1994 .
[17] Philip Resnik,et al. Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.
[18] Philippe Joly,et al. Efficient automatic analysis of camera work and microsegmentation of video using spatiotemporal images , 1996, Signal Process. Image Commun..
[19] David L. Sheinberg,et al. Visual object recognition. , 1996, Annual review of neuroscience.
[20] Jiri Matas,et al. On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..
[21] Takeo Kanade,et al. Video OCR: indexing digital news libraries by recognition of superimposed captions , 1999, Multimedia Systems.
[22] Yihong Gong,et al. Lessons Learned from Building a Terabyte Digital Video Library , 1999, Computer.
[23] Marcel Worring,et al. Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..
[24] Robert P. W. Duin,et al. PRTools - Version 3.0 - A Matlab Toolbox for Pattern Recognition , 2000 .
[25] Anil K. Jain,et al. Statistical Pattern Recognition: A Review , 2000, IEEE Trans. Pattern Anal. Mach. Intell..
[26] Peter M. A. Sloot,et al. The distributed ASCI Supercomputer project , 2000, OPSR.
[27] Vladimir N. Vapnik,et al. The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.
[28] P. Bartlett,et al. Probabilities for SV Machines , 2000 .
[29] Christiane Fellbaum,et al. Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.
[30] Anil K. Jain,et al. Image classification for content-based indexing , 2001, IEEE Trans. Image Process..
[31] Arnold W. M. Smeulders,et al. Color Invariance , 2001, IEEE Trans. Pattern Anal. Mach. Intell..
[32] Cees G. M. Snoek. The authoring metaphor to machine understanding of multimedia , 2001 .
[33] Edward Y. Chang,et al. Support vector machine active learning for image retrieval , 2001, MULTIMEDIA '01.
[34] Djoerd Hiemstra,et al. Lazy Users and Automatic Video Retrieval Tools in (the) Lowlands , 2001, TREC.
[35] Cor J. Veenman,et al. Resolving Motion Correspondence for Densely Moving Points , 2001, IEEE Trans. Pattern Anal. Mach. Intell..
[36] Jean-Luc Gauvain,et al. The LIMSI Broadcast News transcription system , 2002, Speech Commun..
[37] Georges Quénot,et al. CLIPS at TREC 11: Experiments in Video Retrieval , 2002, TREC.
[38] Erik F. Tjong Kim Sang,et al. Memory-Based Shallow Parsing , 2002, J. Mach. Learn. Res..
[39] M. Tarr,et al. Visual Object Recognition , 1996, ISTCS.
[40] Marcel Worring,et al. Interactive Search Using Indexing, Filtering, Browsing and Ranking , 2003, TRECVID.
[41] Thomas S. Huang,et al. Relevance feedback in image retrieval: A comprehensive review , 2003, Multimedia Systems.
[42] Andrew Zisserman,et al. Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.
[43] Harriet J. Nock,et al. Discriminative model fusion for semantic concept detection and annotation in video , 2003, ACM Multimedia.
[44] Ching-Yung Lin,et al. Video Collaborative Annotation Forum: Establishing Ground-Truth Labels on Large Multimedia Datasets , 2003, TRECVID.
[45] Jianping Fan,et al. ClassView: hierarchical video shot classification, indexing, and accessing , 2004, IEEE Transactions on Multimedia.
[46] Takeo Kanade,et al. Object Detection Using the Statistics of Parts , 2004, International Journal of Computer Vision.
[47] Ted Pedersen,et al. WordNet::Similarity - Measuring the Relatedness of Concepts , 2004, NAACL.
[48] G LoweDavid,et al. Distinctive Image Features from Scale-Invariant Keypoints , 2004 .
[49] Yi Wu,et al. Ontology-based multi-classification learning for video concept detection , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).
[50] CHENGXIANG ZHAI,et al. A study of smoothing methods for language models applied to information retrieval , 2004, TOIS.
[51] Dirk P. Kroese,et al. The Cross-Entropy Method: A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation and Machine Learning , 2004 .
[52] Cordelia Schmid,et al. Scale & Affine Invariant Interest Point Detectors , 2004, International Journal of Computer Vision.
[53] David G. Lowe,et al. Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.
[54] Jitendra Malik,et al. Representing and Recognizing the Visual Appearance of Materials using Three-dimensional Textons , 2001, International Journal of Computer Vision.
[55] Daniel P. Huttenlocher,et al. Efficient Graph-Based Image Segmentation , 2004, International Journal of Computer Vision.
[56] Christian Petersohn. Fraunhofer HHI at TRECVID 2004: Shot Boundary Detection System , 2004, TRECVID.
[57] Milind R. Naphade. On supervision and statistical learning for semantic multimedia analysis , 2004, J. Vis. Commun. Image Represent..
[58] Dirk P. Kroese,et al. The Cross Entropy Method: A Unified Approach To Combinatorial Optimization, Monte-carlo Simulation (Information Science and Statistics) , 2004 .
[59] Marcel Worring,et al. Multimodal Video Indexing : A Review of the State-ofthe-art , 2001 .
[60] Leonidas J. Guibas,et al. The Earth Mover's Distance as a Metric for Image Retrieval , 2000, International Journal of Computer Vision.
[61] Bernt Schiele,et al. Natural Scene Retrieval Based on a Semantic Modeling Step , 2004, CIVR.
[62] Marcel Worring,et al. The MediaMill TRECVID 2004 Semantic Viedo Search Engine , 2004, TRECVID.
[63] Milind R. Naphade,et al. Learning the semantics of multimedia queries and concepts from a small number of examples , 2005, MULTIMEDIA '05.
[64] Marcel Worring,et al. Multimedia event-based video indexing using time intervals , 2005, IEEE Transactions on Multimedia.
[65] Christopher D. Manning,et al. Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling , 2005, ACL.
[66] Cordelia Schmid,et al. A Comparison of Affine Region Detectors , 2005, International Journal of Computer Vision.
[67] Cees G. M. Snoek,et al. Early versus late fusion in semantic video analysis , 2005, MULTIMEDIA '05.
[68] Antonio Torralba,et al. Describing Visual Scenes using Transformed Dirichlet Processes , 2005, NIPS.
[69] Cordelia Schmid,et al. A performance evaluation of local descriptors , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[70] Pietro Perona,et al. A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).
[71] Djoerd Hiemstra,et al. An Integrated Approach to Text and Image Retrieval- The Lowlands Team at Trecvid 2005 , 2005, TRECVID.
[72] G. P. Nguyen,et al. The MediaMill TRECVID 2005 Semantic Video Search Engine (Draft Version). , 2005 .
[73] Joost van de Weijer,et al. Edge and corner detection by photometric quasi-invariants , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[74] Frédéric Jurie,et al. Creating efficient codebooks for visual recognition , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.
[75] Marcel Worring,et al. User transparent parallel processing of the 2004 NIST TRECVID data set , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.
[76] Arnold W. M. Smeulders,et al. Color texture measurement and segmentation , 2005, Signal Process..
[77] B. Huurnink. Autoseek towards a Fully Automated Video Search System Acknowledgements , 2005 .
[78] Luc Van Gool,et al. Modeling scenes with local descriptors and latent aspects , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.
[79] G. P. Nguyen,et al. Similarity Based Visualization of Image Collections , 2005 .
[80] Marcel Worring,et al. On the surplus value of semantic video analysis beyond the key frame , 2005, 2005 IEEE International Conference on Multimedia and Expo.
[81] Joshua R. Smith,et al. A Web-based System for Collaborative Annotation of Large Image and Video Collections , 2005 .
[82] Joost van de Weijer,et al. Boosting saliency in color image features , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).
[83] Shih-Fu Chang,et al. Visual Cue Cluster Construction via Information Bottleneck Principle and Kernel Density Estimation , 2005, CIVR.
[84] Arnold W. M. Smeulders,et al. c ○ 2005 Springer Science + Business Media, Inc. Manufactured in The Netherlands. A Six-Stimulus Theory for Stochastic Texture , 2002 .
[85] Marcel Worring,et al. Learning rich semantics from news video archives by style analysis , 2006, TOMCCAP.
[86] Paul Over,et al. Evaluation campaigns and TRECVid , 2006, MIR '06.
[87] Jan-Mark Geusebroek,et al. Compact Object Descriptors from Local Colour Invariant Histograms , 2006, BMVC.
[88] Cordelia Schmid,et al. Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study , 2006, CVPR Workshops.
[89] Thorsten Joachims,et al. Training linear SVMs in linear time , 2006, KDD '06.
[90] Marcel Worring,et al. The Semantic Pathfinder: Using an Authoring Metaphor for Generic Multimedia Indexing , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[91] Lih-Yuan Deng,et al. The Cross-Entropy Method: A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation, and Machine Learning , 2006, Technometrics.
[92] Cor J. Veenman,et al. Robust Scene Categorization by Learning Image Statistics in Context , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).
[93] Marcel Worring,et al. The challenge problem for automated detection of 101 semantic concepts in multimedia , 2006, MM '06.
[94] Cordelia Schmid,et al. Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).
[95] John R. Smith,et al. Large-scale concept ontology for multimedia , 2006, IEEE MultiMedia.
[96] Marcel Worring,et al. Browsing News Video using Semantic Threads , 2006 .
[97] Gunnar Rätsch,et al. Large Scale Multiple Kernel Learning , 2006, J. Mach. Learn. Res..
[98] Cordelia Schmid,et al. Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).
[99] Cor J. Veenman,et al. The influence of cross-validation on video classification performance , 2006, MM '06.
[100] Cordelia Schmid,et al. Coloring Local Feature Extraction , 2006, ECCV.
[101] K. V. D. Sande,et al. Coloring Concept Detection in Video Using Interest Regions Coloring Concept Detection in Video Using Interest Regions Specialization: Multimedia and Intelligent Systems , 2007 .
[102] Marcel Worring,et al. Query on demand video browsing , 2007, ACM Multimedia.
[103] Marcel Worring,et al. A Learned Lexicon-Driven Paradigm for Interactive Video Retrieval , 2007, IEEE Transactions on Multimedia.
[104] Marcel Worring,et al. Multi Thread Video Browsing , 2007 .
[105] Dong Wang,et al. Video diver: generic video indexing with diverse features , 2007, MIR '07.
[106] Sheng Tang,et al. TRECVID 2007 High-Level Feature Extraction By MCG-ICT-CAS , 2007, TRECVID.
[107] Maarten de Rijke,et al. The value of stories for speech-based video search , 2007, CIVR '07.
[108] Jiawei Han,et al. Efficient Kernel Discriminant Analysis via Spectral Regression , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).
[109] Franciska de Jong,et al. Annotation of Heterogeneous Multimedia Content Using Automatic Speech Recognition , 2007, SAMT.
[110] Hsuan-Tien Lin,et al. A note on Platt’s probabilistic outputs for support vector machines , 2007, Machine Learning.
[111] Xie Kanglin. Lucene Search Engine , 2007 .
[112] Maarten de Rijke,et al. Exploiting redundancy in cross-channel video retrieval , 2007, MIR '07.
[113] Marcel Worring,et al. Adding Semantics to Detectors for Video Retrieval , 2007, IEEE Transactions on Multimedia.
[114] Cordelia Schmid,et al. Learning Object Representations for Visual Object Class Recognition , 2007, ICCV 2007.
[115] Marcel Worring,et al. Balancing thread based navigation for targeted video search , 2008, CIVR '08.
[116] M. de Rijke,et al. UvA-DARE ( Digital Academic Repository ) The MediaMill TRECVID 2008 semantic video search engine , 2008 .
[117] Apostol Natsev,et al. Web-based information content and its application to concept-based video retrieval , 2008, CIVR '08.
[118] Jieping Ye,et al. Multi-class Discriminant Kernel Learning via Convex Programming , 2008, J. Mach. Learn. Res..
[119] Meng Wang,et al. MSRA atT TRECVID 2008: High-Level Feature Extraction and Automatic Search , 2008, TRECVID.
[120] Stéphane Ayache,et al. Video Corpus Annotation Using Active Learning , 2008, ECIR.
[121] Subhransu Maji,et al. Classification using intersection kernel support vector machines is efficient , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.
[122] Koen E. A. van de Sande,et al. Evaluation of color descriptors for object and scene recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.
[123] Andrew Zisserman,et al. Efficient Visual Search for Objects in Videos , 2008, Proceedings of the IEEE.
[124] Cor J. Veenman,et al. Kernel Codebooks for Scene Categorization , 2008, ECCV.
[125] Luc Van Gool,et al. Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..
[126] Katja Hofmann,et al. Assessing concept selection for video retrieval , 2008, MIR '08.