相关论文

Abstract:In this paper we describe our TRECVID 2008 video retrieval experiments. The MediaMill team participated in three tasks: concept detection, automatic search, and interac- tive search. Rather than continuing to increase the number of concept detectors available for retrieval, our TRECVID 2008 experiments focus on increasing the robustness of a small set of detectors using a bag-of-words approach. To that end, our concept detection experiments emphasize in particular the role of visual sampling, the value of color in- variant features, the influence of codebook construction, and the effectiveness of kernel-based learning parameters. For retrieval, a robust but limited set of concept detectors ne- cessitates the need to rely on as many auxiliary information channels as possible. Therefore, our automatic search ex- periments focus on predicting which information channel to trust given a certain topic, leading to a novel framework for predictive video retrieval. To improve the video retrieval re- sults further, our interactive search experiments investigate the roles of visualizing preview results for a certain browse- dimension and active learning mechanisms that learn to solve complex search topics by analysis from user brows- ing behavior. The 2008 edition of the TRECVID bench- mark has been the most successful MediaMill participation to date, resulting in the top ranking for both concept de- tection and interactive search, and a runner-up ranking for automatic retrieval. Again a lot has been learned during this year’s TRECVID campaign; we highlight the most im- portant lessons at the end of this paper.

参考文献

[1]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[2]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[3]  H. Kuhn The Hungarian method for the assignment problem , 1955 .

[4]  M M Astrahan SPEECH ANALYSIS BY CLUSTERING, OR THE HYPERPHONEME METHOD , 1970 .

[5]  Gerard Salton,et al.  The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[6]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[7]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[8]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[9]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[10]  Gerard Salton,et al.  Improving retrieval performance by relevance feedback , 1997, J. Am. Soc. Inf. Sci..

[11]  Wilson S. Geisler,et al.  Multichannel Texture Analysis Using Localized Spatial Filters , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[13]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[14]  Edward A. Fox,et al.  Combination of Multiple Searches , 1993, TREC.

[15]  Yukinobu Taniguchi,et al.  Structured Video Computing , 1994, IEEE MultiMedia.

[16]  Helmut Schmidt,et al.  Probabilistic part-of-speech tagging using decision trees , 1994 .

[17]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[18]  Philippe Joly,et al.  Efficient automatic analysis of camera work and microsegmentation of video using spatiotemporal images , 1996, Signal Process. Image Commun..

[19]  David L. Sheinberg,et al.  Visual object recognition. , 1996, Annual review of neuroscience.

[20]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  Takeo Kanade,et al.  Video OCR: indexing digital news libraries by recognition of superimposed captions , 1999, Multimedia Systems.

[22]  Yihong Gong,et al.  Lessons Learned from Building a Terabyte Digital Video Library , 1999, Computer.

[23]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[24]  Robert P. W. Duin,et al.  PRTools - Version 3.0 - A Matlab Toolbox for Pattern Recognition , 2000 .

[25]  Anil K. Jain,et al.  Statistical Pattern Recognition: A Review , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[26]  Peter M. A. Sloot,et al.  The distributed ASCI Supercomputer project , 2000, OPSR.

[27]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[28]  P. Bartlett,et al.  Probabilities for SV Machines , 2000 .

[29]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[30]  Anil K. Jain,et al.  Image classification for content-based indexing , 2001, IEEE Trans. Image Process..

[31]  Arnold W. M. Smeulders,et al.  Color Invariance , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[32]  Cees G. M. Snoek The authoring metaphor to machine understanding of multimedia , 2001 .

[33]  Edward Y. Chang,et al.  Support vector machine active learning for image retrieval , 2001, MULTIMEDIA '01.

[34]  Djoerd Hiemstra,et al.  Lazy Users and Automatic Video Retrieval Tools in (the) Lowlands , 2001, TREC.

[35]  Cor J. Veenman,et al.  Resolving Motion Correspondence for Densely Moving Points , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[36]  Jean-Luc Gauvain,et al.  The LIMSI Broadcast News transcription system , 2002, Speech Commun..

[37]  Georges Quénot,et al.  CLIPS at TREC 11: Experiments in Video Retrieval , 2002, TREC.

[38]  Erik F. Tjong Kim Sang,et al.  Memory-Based Shallow Parsing , 2002, J. Mach. Learn. Res..

[39]  M. Tarr,et al.  Visual Object Recognition , 1996, ISTCS.

[40]  Marcel Worring,et al.  Interactive Search Using Indexing, Filtering, Browsing and Ranking , 2003, TRECVID.

[41]  Thomas S. Huang,et al.  Relevance feedback in image retrieval: A comprehensive review , 2003, Multimedia Systems.

[42]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[43]  Harriet J. Nock,et al.  Discriminative model fusion for semantic concept detection and annotation in video , 2003, ACM Multimedia.

[44]  Ching-Yung Lin,et al.  Video Collaborative Annotation Forum: Establishing Ground-Truth Labels on Large Multimedia Datasets , 2003, TRECVID.

[45]  Jianping Fan,et al.  ClassView: hierarchical video shot classification, indexing, and accessing , 2004, IEEE Transactions on Multimedia.

[46]  Takeo Kanade,et al.  Object Detection Using the Statistics of Parts , 2004, International Journal of Computer Vision.

[47]  Ted Pedersen,et al.  WordNet::Similarity - Measuring the Relatedness of Concepts , 2004, NAACL.

[48]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[49]  Yi Wu,et al.  Ontology-based multi-classification learning for video concept detection , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[50]  CHENGXIANG ZHAI,et al.  A study of smoothing methods for language models applied to information retrieval , 2004, TOIS.

[51]  Dirk P. Kroese,et al.  The Cross-Entropy Method: A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation and Machine Learning , 2004 .

[52]  Cordelia Schmid,et al.  Scale & Affine Invariant Interest Point Detectors , 2004, International Journal of Computer Vision.

[53]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[54]  Jitendra Malik,et al.  Representing and Recognizing the Visual Appearance of Materials using Three-dimensional Textons , 2001, International Journal of Computer Vision.

[55]  Daniel P. Huttenlocher,et al.  Efficient Graph-Based Image Segmentation , 2004, International Journal of Computer Vision.

[56]  Christian Petersohn Fraunhofer HHI at TRECVID 2004: Shot Boundary Detection System , 2004, TRECVID.

[57]  Milind R. Naphade On supervision and statistical learning for semantic multimedia analysis , 2004, J. Vis. Commun. Image Represent..

[58]  Dirk P. Kroese,et al.  The Cross Entropy Method: A Unified Approach To Combinatorial Optimization, Monte-carlo Simulation (Information Science and Statistics) , 2004 .

[59]  Marcel Worring,et al.  Multimodal Video Indexing : A Review of the State-ofthe-art , 2001 .

[60]  Leonidas J. Guibas,et al.  The Earth Mover's Distance as a Metric for Image Retrieval , 2000, International Journal of Computer Vision.

[61]  Bernt Schiele,et al.  Natural Scene Retrieval Based on a Semantic Modeling Step , 2004, CIVR.

[62]  Marcel Worring,et al.  The MediaMill TRECVID 2004 Semantic Viedo Search Engine , 2004, TRECVID.

[63]  Milind R. Naphade,et al.  Learning the semantics of multimedia queries and concepts from a small number of examples , 2005, MULTIMEDIA '05.

[64]  Marcel Worring,et al.  Multimedia event-based video indexing using time intervals , 2005, IEEE Transactions on Multimedia.

[65]  Christopher D. Manning,et al.  Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling , 2005, ACL.

[66]  Cordelia Schmid,et al.  A Comparison of Affine Region Detectors , 2005, International Journal of Computer Vision.

[67]  Cees G. M. Snoek,et al.  Early versus late fusion in semantic video analysis , 2005, MULTIMEDIA '05.

[68]  Antonio Torralba,et al.  Describing Visual Scenes using Transformed Dirichlet Processes , 2005, NIPS.

[69]  Cordelia Schmid,et al.  A performance evaluation of local descriptors , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[70]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[71]  Djoerd Hiemstra,et al.  An Integrated Approach to Text and Image Retrieval- The Lowlands Team at Trecvid 2005 , 2005, TRECVID.

[72]  G. P. Nguyen,et al.  The MediaMill TRECVID 2005 Semantic Video Search Engine (Draft Version). , 2005 .

[73]  Joost van de Weijer,et al.  Edge and corner detection by photometric quasi-invariants , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[74]  Frédéric Jurie,et al.  Creating efficient codebooks for visual recognition , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[75]  Marcel Worring,et al.  User transparent parallel processing of the 2004 NIST TRECVID data set , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[76]  Arnold W. M. Smeulders,et al.  Color texture measurement and segmentation , 2005, Signal Process..

[77]  B. Huurnink Autoseek towards a Fully Automated Video Search System Acknowledgements , 2005 .

[78]  Luc Van Gool,et al.  Modeling scenes with local descriptors and latent aspects , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[79]  G. P. Nguyen,et al.  Similarity Based Visualization of Image Collections , 2005 .

[80]  Marcel Worring,et al.  On the surplus value of semantic video analysis beyond the key frame , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[81]  Joshua R. Smith,et al.  A Web-based System for Collaborative Annotation of Large Image and Video Collections , 2005 .

[82]  Joost van de Weijer,et al.  Boosting saliency in color image features , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[83]  Shih-Fu Chang,et al.  Visual Cue Cluster Construction via Information Bottleneck Principle and Kernel Density Estimation , 2005, CIVR.

[84]  Arnold W. M. Smeulders,et al.  c ○ 2005 Springer Science + Business Media, Inc. Manufactured in The Netherlands. A Six-Stimulus Theory for Stochastic Texture , 2002 .

[85]  Marcel Worring,et al.  Learning rich semantics from news video archives by style analysis , 2006, TOMCCAP.

[86]  Paul Over,et al.  Evaluation campaigns and TRECVid , 2006, MIR '06.

[87]  Jan-Mark Geusebroek,et al.  Compact Object Descriptors from Local Colour Invariant Histograms , 2006, BMVC.

[88]  Cordelia Schmid,et al.  Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study , 2006, CVPR Workshops.

[89]  Thorsten Joachims,et al.  Training linear SVMs in linear time , 2006, KDD '06.

[90]  Marcel Worring,et al.  The Semantic Pathfinder: Using an Authoring Metaphor for Generic Multimedia Indexing , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[91]  Lih-Yuan Deng,et al.  The Cross-Entropy Method: A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation, and Machine Learning , 2006, Technometrics.

[92]  Cor J. Veenman,et al.  Robust Scene Categorization by Learning Image Statistics in Context , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[93]  Marcel Worring,et al.  The challenge problem for automated detection of 101 semantic concepts in multimedia , 2006, MM '06.

[94]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[95]  John R. Smith,et al.  Large-scale concept ontology for multimedia , 2006, IEEE MultiMedia.

[96]  Marcel Worring,et al.  Browsing News Video using Semantic Threads , 2006 .

[97]  Gunnar Rätsch,et al.  Large Scale Multiple Kernel Learning , 2006, J. Mach. Learn. Res..

[98]  Cordelia Schmid,et al.  Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[99]  Cor J. Veenman,et al.  The influence of cross-validation on video classification performance , 2006, MM '06.

[100]  Cordelia Schmid,et al.  Coloring Local Feature Extraction , 2006, ECCV.

[101]  K. V. D. Sande,et al.  Coloring Concept Detection in Video Using Interest Regions Coloring Concept Detection in Video Using Interest Regions Specialization: Multimedia and Intelligent Systems , 2007 .

[102]  Marcel Worring,et al.  Query on demand video browsing , 2007, ACM Multimedia.

[103]  Marcel Worring,et al.  A Learned Lexicon-Driven Paradigm for Interactive Video Retrieval , 2007, IEEE Transactions on Multimedia.

[104]  Marcel Worring,et al.  Multi Thread Video Browsing , 2007 .

[105]  Dong Wang,et al.  Video diver: generic video indexing with diverse features , 2007, MIR '07.

[106]  Sheng Tang,et al.  TRECVID 2007 High-Level Feature Extraction By MCG-ICT-CAS , 2007, TRECVID.

[107]  Maarten de Rijke,et al.  The value of stories for speech-based video search , 2007, CIVR '07.

[108]  Jiawei Han,et al.  Efficient Kernel Discriminant Analysis via Spectral Regression , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[109]  Franciska de Jong,et al.  Annotation of Heterogeneous Multimedia Content Using Automatic Speech Recognition , 2007, SAMT.

[110]  Hsuan-Tien Lin,et al.  A note on Platt’s probabilistic outputs for support vector machines , 2007, Machine Learning.

[111]  Xie Kanglin Lucene Search Engine , 2007 .

[112]  Maarten de Rijke,et al.  Exploiting redundancy in cross-channel video retrieval , 2007, MIR '07.

[113]  Marcel Worring,et al.  Adding Semantics to Detectors for Video Retrieval , 2007, IEEE Transactions on Multimedia.

[114]  Cordelia Schmid,et al.  Learning Object Representations for Visual Object Class Recognition , 2007, ICCV 2007.

[115]  Marcel Worring,et al.  Balancing thread based navigation for targeted video search , 2008, CIVR '08.

[116]  M. de Rijke,et al.  UvA-DARE ( Digital Academic Repository ) The MediaMill TRECVID 2008 semantic video search engine , 2008 .

[117]  Apostol Natsev,et al.  Web-based information content and its application to concept-based video retrieval , 2008, CIVR '08.

[118]  Jieping Ye,et al.  Multi-class Discriminant Kernel Learning via Convex Programming , 2008, J. Mach. Learn. Res..

[119]  Meng Wang,et al.  MSRA atT TRECVID 2008: High-Level Feature Extraction and Automatic Search , 2008, TRECVID.

[120]  Stéphane Ayache,et al.  Video Corpus Annotation Using Active Learning , 2008, ECIR.

[121]  Subhransu Maji,et al.  Classification using intersection kernel support vector machines is efficient , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[122]  Koen E. A. van de Sande,et al.  Evaluation of color descriptors for object and scene recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[123]  Andrew Zisserman,et al.  Efficient Visual Search for Objects in Videos , 2008, Proceedings of the IEEE.

[124]  Cor J. Veenman,et al.  Kernel Codebooks for Scene Categorization , 2008, ECCV.

[125]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[126]  Katja Hofmann,et al.  Assessing concept selection for video retrieval , 2008, MIR '08.

引用
Ensemble Learning with LDA Topic Models for Visual Concept Detection
2012
A Multimodal Scheme for Program Segmentation and Representation in Broadcast Video Streams
IEEE Transactions on Multimedia
2008
Smart Video Browsing with Augmented Navigation Bars
MMM
2013
Semantic concept-based query expansion and re-ranking for multimedia retrieval
ACM Multimedia
2007
Fisher Kernel Temporal Variation-based Relevance Feedback for video retrieval
Comput. Vis. Image Underst.
2016
Vocabulary Expansion Using Word Vectors for Video Semantic Indexing
ACM Multimedia
2015
A reranking approach for context-based concept fusion in video indexing and retrieval
CIVR '07
2007
n-gram Models for Video Semantic Indexing
ACM Multimedia
2014
A Selective Weighted Late Fusion for Visual Concept Recognition
ECCV Workshops
2012
Query by Example by Extracting Inductive Query Definitions Using Rough Set Theory
2012
Fusion in Computer Vision
Advances in Computer Vision and Pattern Recognition
2014
A Novel Method for Spoken Text Feature Extraction in Semantic Video Retrieval
PCM
2006
Covering the Space of Tilts. Application to Affine Invariant Image Comparison
SIAM J. Imaging Sci.
2018
A Fast and Accurate Video Semantic-Indexing System Using Fast MAP Adaptation and GMM Supervectors
IEEE Transactions on Multimedia
2012
Extreme video retrieval: joint maximization of human and computer performance
MM '06
2006
A System That Learns to Tag Videos by Watching Youtube
ICVS
2008
Video browsing interfaces and applications: a review
2010
Zhejiang University at TRECVID 2006
TRECVID
2006
PKU_ICST at TRECVID 2018: Instance Search Task
TRECVID
2013
Memory recall based video search: Finding videos you have seen before based on your memory
TOMCCAP
2014