Concept-Based Video Retrieval

In this paper, we review 300 references on video retrieval, indicating when text-only solutions are unsatisfactory and showing the promising alternatives which are in majority concept-based. Therefore, central to our discussion is the notion of a semantic concept: an objective linguistic description of an observable entity. Specifically, we present our view on how its automated detection, selection under uncertainty, and interactive usage might solve the major scientific problem for video retrieval: the semantic gap. To bridge the gap, we lay down the anatomy of a concept-based video search engine. We present a component-wise decomposition of such an interdisciplinary multimedia system, covering influences from information retrieval, computer vision, machine learning, and human–computer interaction. For each of the components we review state-of-the-art solutions in the literature, each having different characteristics and merits. Because of these differences, we cannot understand the progress in video retrieval without serious evaluation efforts such as carried out in the NIST TRECVID benchmark. We discuss its data, tasks, results, and the many derived community initiatives in creating annotations and baselines for repeatable experiments. We conclude with our perspective on future challenges and opportunities.

[1]  John Adcock,et al.  Experiments in interactive video search by addition and subtraction , 2008, CIVR '08.

[2]  Jean Tague-Sutcliffe,et al.  The Pragmatics of Information Retrieval Experimentation Revisited , 1997, Inf. Process. Manag..

[3]  JainRamesh,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000 .

[4]  Marieke Guy,et al.  Folksonomies: Tidying Up Tags? , 2006, D Lib Mag..

[5]  Wei-Hao Lin,et al.  News video classification using SVM-based multimodal classifiers and combination strategies , 2002, MULTIMEDIA '02.

[6]  Alan F. Smeaton,et al.  iBingo mobile collaborative search , 2008, CIVR '08.

[7]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[8]  Minh N. Do,et al.  Integrated Browsing and Searching of Large Image Collections , 2000, VISUAL.

[9]  Martha Larson,et al.  Overview of VideoCLEF 2008: Automatic Generation of Topic-based Feeds for Dual Language Audio-Visual Content , 2008, CLEF.

[10]  John R. Smith,et al.  On the detection of semantic concepts at TRECVID , 2004, MULTIMEDIA '04.

[11]  Marcel Worring,et al.  On the surplus value of semantic video analysis beyond the key frame , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[12]  Euripides G. M. Petrakis,et al.  Matching and Retrieval of Distorted and Occluded Shapes Using Dynamic Programming , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  Stefan M. Rüger,et al.  Information-theoretic semantic multimedia indexing , 2007, CIVR '07.

[14]  Jianping Fan,et al.  ClassView: hierarchical video shot classification, indexing, and accessing , 2004, IEEE Transactions on Multimedia.

[15]  Michael J. Swain,et al.  Color indexing , 1991, International Journal of Computer Vision.

[16]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[17]  Rong Yan,et al.  Video Retrieval Based on Semantic Concepts , 2008, Proceedings of the IEEE.

[18]  Shahram Ebadollahi,et al.  Visual Event Detection using Multi-Dimensional Concept Dynamics , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[19]  Hugo Liu,et al.  ConceptNet — A Practical Commonsense Reasoning Tool-Kit , 2004 .

[20]  Dong Wang,et al.  Video diver: generic video indexing with diverse features , 2007, MIR '07.

[21]  Joemon M. Jose,et al.  Glasgow University at TRECVid 2006 , 2006, TRECVID.

[22]  Howard D. Wactlar,et al.  Putting active learning into multimedia applications: dynamic definition and refinement of concept classifiers , 2005, MULTIMEDIA '05.

[23]  Roberto Cipolla,et al.  Semantic texton forests for image categorization and segmentation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Trygve Randen,et al.  Filtering for Texture Classification: A Comparative Study , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[25]  Mohan S. Kankanhalli,et al.  Application Potential of Multimedia Information Retrieval , 2008, Proceedings of the IEEE.

[26]  Ramin Zabih,et al.  Comparing images using color coherence vectors , 1997, MULTIMEDIA '96.

[27]  Seyed M. M. Tahaghoghi,et al.  Modeling Human Judgment of Digital Imagery for Multimedia Retrieval , 2007, IEEE Transactions on Multimedia.

[28]  Glorianna Davenport,et al.  Cinematic primitives for multimedia , 1991, IEEE Computer Graphics and Applications.

[29]  Michael G. Christel,et al.  Mining Novice User Activity with TRECVID Interactive Retrieval Tasks , 2006, CIVR.

[30]  Frank Nack,et al.  Saying What it Means: Semi-Automated (News) Media Annotation , 2004, Multimedia Tools and Applications.

[31]  Paul Over,et al.  TRECVID 2005 - An Overview , 2005, TRECVID.

[32]  Maarten de Rijke,et al.  Exploiting redundancy in cross-channel video retrieval , 2007, MIR '07.

[33]  Stéphane Ayache,et al.  Video Corpus Annotation Using Active Learning , 2008, ECIR.

[34]  Toshikazu Kato,et al.  A sketch retrieval method for full color image database-query by visual example , 1992, [1992] Proceedings. 11th IAPR International Conference on Pattern Recognition.

[35]  Marcel Worring,et al.  Multimodal Video Indexing : A Review of the State-ofthe-art , 2001 .

[36]  Alan F. Smeaton,et al.  Designing the User Interface for the Físchlár Digital Video Library , 2006, J. Digit. Inf..

[37]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[38]  Stephen E. Robertson,et al.  Okapi at TREC-4 , 1995, TREC.

[39]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[40]  Anil K. Jain,et al.  Image classification for content-based indexing , 2001, IEEE Trans. Image Process..

[41]  Chong-Wah Ngo,et al.  Towards optimal bag-of-features for object categorization and semantic video retrieval , 2007, CIVR '07.

[42]  Marcel Worring,et al.  Interactive access to large image collections using similarity-based visualization , 2008, J. Vis. Lang. Comput..

[43]  Andrew Zisserman,et al.  Object Level Grouping for Video Shots , 2004, International Journal of Computer Vision.

[44]  Wei-Hao Lin,et al.  Assessing Effectiveness in Video Retrieval , 2005, CIVR.

[45]  Shih-Fu Chang,et al.  CU-VIREO 374 : Fusing Columbia 374 and VIREO 374 for Large Scale Semantic Concept Detection , 2008 .

[46]  Chong-Wah Ngo,et al.  Selection of Concept Detectors for Video Search by Ontology-Enriched Semantic Spaces , 2008, IEEE Transactions on Multimedia.

[47]  Changsheng Xu,et al.  Using Webcast Text for Semantic Event Detection in Broadcast Sports Video , 2008, IEEE Transactions on Multimedia.

[48]  John R. Smith,et al.  Large-scale concept ontology for multimedia , 2006, IEEE MultiMedia.

[49]  Tat-Seng Chua,et al.  Fusion of AV features and external information sources for event detection in team sports video , 2006, TOMCCAP.

[50]  Emine Yilmaz,et al.  Estimating average precision when judgments are incomplete , 2007, Knowledge and Information Systems.

[51]  Haim H. Permuter,et al.  IBM Research TREC 2002 Video Retrieval System , 2002, TREC.

[52]  Hua Li,et al.  Mobile Search With Multimodal Queries , 2008, Proceedings of the IEEE.

[53]  Christian Petersohn Fraunhofer HHI at TRECVID 2004: Shot Boundary Detection System , 2004, TRECVID.

[54]  David A. Shamma,et al.  Watch what I watch: using community activity to understand content , 2007, MIR '07.

[55]  Graeme Hirst,et al.  Evaluating WordNet-based Measures of Lexical Semantic Relatedness , 2006, CL.

[56]  Rong Yan,et al.  Probabilistic latent query analysis for combining multiple retrieval sources , 2006, SIGIR.

[57]  Eero Hyvönen,et al.  Ontology-Based Image Retrieval , 2003, WWW.

[58]  Cordelia Schmid,et al.  Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[59]  Shih-Fu Chang,et al.  MediaNet: a multimedia information network for knowledge representation , 2000, SPIE Optics East.

[60]  Jianping Fan,et al.  Analyzing Large-Scale News Video Databases to Support Knowledge Visualization and Intuitive Retrieval , 2007, 2007 IEEE Symposium on Visual Analytics Science and Technology.

[61]  Ming-Kuei Hu,et al.  Visual pattern recognition by moment invariants , 1962, IRE Trans. Inf. Theory.

[62]  Marcel Worring,et al.  Annotating images by harnessing worldwide user-tagged photos , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[63]  R. Goulden,et al.  How large can a receptive vocabulary be? , 1990 .

[64]  Marcel Worring,et al.  High-Performance Distributed Video Content Analysis with Parallel-Horus , 2007, IEEE MultiMedia.

[65]  John Adcock,et al.  FXPAL MediaMagic video search system , 2007, CIVR '07.

[66]  Ching-Yung Lin,et al.  Video Collaborative Annotation Forum: Establishing Ground-Truth Labels on Large Multimedia Datasets , 2003, TRECVID.

[67]  Steffen Staab,et al.  Knowledge representation and semantic annotation of multimedia content , 2006 .

[68]  Shih-Fu Chang,et al.  Context-Based Concept Fusion with Boosted Conditional Random Fields , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[69]  Goujun Lu,et al.  Indexing and Retrieval of Audio: A Survey , 2001, Multimedia Tools and Applications.

[70]  Jing Huang,et al.  Spatial Color Indexing and Applications , 2004, International Journal of Computer Vision.

[71]  Tat-Seng Chua Towards the next plateau: innovative multimedia research beyond trecvid , 2007, ACM Multimedia.

[72]  Marcel Worring,et al.  The challenge problem for automated detection of 101 semantic concepts in multimedia , 2006, MM '06.

[73]  Wei-Ying Ma,et al.  AnnoSearch: Image Auto-Annotation by Search , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[74]  Thijs Westerveld,et al.  Using generative probabilistic models for multimedia retrieval , 2005, SIGF.

[75]  Franciska de Jong,et al.  Annotation of Heterogeneous Multimedia Content Using Automatic Speech Recognition , 2007, SAMT.

[76]  Marcel Worring,et al.  Building a visual ontology for video retrieval , 2005, MULTIMEDIA '05.

[77]  Arnold W. M. Smeulders,et al.  Color Invariance , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[78]  Alan F. Smeaton,et al.  K-Space Interactive Search , 2008, CIVR '08.

[79]  Hsuan-Tien Lin,et al.  A note on Platt’s probabilistic outputs for support vector machines , 2007, Machine Learning.

[80]  Marcel Worring,et al.  High-Performance Distributed Image and Video Content Analysis with Parallel-Horus , 2007 .

[81]  Ching-Yung Lin,et al.  Multi-granular detection of regional semantic concepts [video annotation] , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[82]  Chong-Wah Ngo,et al.  Measuring novelty and redundancy with multiple modalities in cross-lingual broadcast news , 2008, Comput. Vis. Image Underst..

[83]  Marcel Worring,et al.  Detection of TV news monologues by style analysis , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[84]  Nicu Sebe,et al.  Content-based multimedia information retrieval: State of the art and challenges , 2006, TOMCCAP.

[85]  Rajesh Shenoy,et al.  On the robustness of relevance measures with incomplete judgments , 2007, SIGIR.

[86]  Milind R. Naphade,et al.  A probabilistic framework for semantic video indexing, filtering, and retrieval , 2001, IEEE Trans. Multim..

[87]  Arnold W. M. Smeulders,et al.  Color texture measurement and segmentation , 2005, Signal Process..

[88]  Tao Mei,et al.  Building a comprehensive ontology to refine video concept detection , 2007, MIR '07.

[89]  Meng Wang,et al.  MSRA-USTC-SJTU at TRECVID 2007: High-Level Feature Extraction and Search , 2007, TRECVID.

[90]  José Luis Vicedo González,et al.  TREC: Experiment and evaluation in information retrieval , 2007, J. Assoc. Inf. Sci. Technol..

[91]  Alan Hanjalic,et al.  Content-Based Analysis of Digital Video , 2004, Springer US.

[92]  Luis von Ahn Games with a Purpose , 2006, Computer.

[93]  B. S. Manjunath,et al.  NeTra: A toolbox for navigating large image databases , 1997, Multimedia Systems.

[94]  Alireza Khotanzad,et al.  Invariant Image Recognition by Zernike Moments , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[95]  Anil K. Jain,et al.  On image classification: city images vs. landscapes , 1998, Pattern Recognit..

[96]  Gang Wang,et al.  TRECVID 2004 Search and Feature Extraction Task by NUS PRIS , 2004, TRECVID.

[97]  Rong Yan,et al.  Mining Relationship Between Video Concepts using Probabilistic Graphical Models , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[98]  Thomas S. Huang,et al.  Relevance feedback in image retrieval: A comprehensive review , 2003, Multimedia Systems.

[99]  Rainer Lienhart,et al.  The Holy Grail of Multimedia Information Retrieval: So Close or Yet So Far Away? , 2008 .

[100]  Arnold W. M. Smeulders,et al.  Color-based object recognition , 1997, Pattern Recognit..

[101]  Yihong Gong,et al.  Automatic parsing and indexing of news video , 1995, Multimedia Systems.

[102]  Thomas S. Huang,et al.  Relevance feedback: a power tool for interactive content-based image retrieval , 1998, IEEE Trans. Circuits Syst. Video Technol..

[103]  Shih-Fu Chang,et al.  Story boundary detection in large broadcast news video archives: techniques, experience and trends , 2004, MULTIMEDIA '04.

[104]  B. S. Manjunath,et al.  Unsupervised Segmentation of Color-Texture Regions in Images and Video , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[105]  Alan F. Smeaton,et al.  Large Scale Evaluations of Multimedia Information Retrieval: The TRECVid Experience , 2005, CIVR.

[106]  Arnold W. M. Smeulders,et al.  Visual quasi-periodicity , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[107]  Rong Yan,et al.  Learning query-class dependent weights in automatic video retrieval , 2004, MULTIMEDIA '04.

[108]  Shih-Fu Chang,et al.  A fully automated content-based video search engine supporting spatiotemporal queries , 1998, IEEE Trans. Circuits Syst. Video Technol..

[109]  A. P. deVries,et al.  Multimedia retrieval using multiple images , 2004 .

[110]  Bernardo A. Huberman,et al.  Usage patterns of collaborative tagging systems , 2006, J. Inf. Sci..

[111]  Shih-Fu Chang,et al.  Query-Adaptive Fusion for Multimodal Search , 2008, Proceedings of the IEEE.

[112]  Alexander Zien,et al.  Semi-Supervised Learning , 2006 .

[113]  Ming-Syan Chen,et al.  Association and Temporal Rule Mining for Post-Filtering of Semantic Concept Detection in Video , 2008, IEEE Transactions on Multimedia.

[114]  Paul Over,et al.  High-level feature detection from video in TRECVid: a 5-year retrospective of achievements , 2009 .

[115]  R. Brunelli,et al.  A Survey on the Automatic Indexing of Video Data, , 1999, J. Vis. Commun. Image Represent..

[116]  Noel E. O'Connor,et al.  Inexpensive fusion methods for enhancing feature detection , 2007, Signal Process. Image Commun..

[117]  Pinar Duygulu Sahin,et al.  Joint visual-text modeling for automatic retrieval of multimedia documents , 2005, ACM Multimedia.

[118]  Ricky Houghton Named Faces: Putting Names to Faces , 1999, IEEE Intell. Syst..

[119]  Anthony Hoogs,et al.  Video content annotation using visual analysis and a large semantic knowledgebase , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[120]  Bob J. Wielinga,et al.  Ontology-Based Photo Annotation , 2001, IEEE Intell. Syst..

[121]  Anil K. Jain,et al.  Unsupervised texture segmentation using Gabor filters , 1990, 1990 IEEE International Conference on Systems, Man, and Cybernetics Conference Proceedings.

[122]  Jean-Luc Gauvain,et al.  The LIMSI Broadcast News transcription system , 2002, Speech Commun..

[123]  Marcel Worring,et al.  Adding Semantics to Detectors for Video Retrieval , 2007, IEEE Transactions on Multimedia.

[124]  Georges Quénot,et al.  CLIPS at TREC 11: Experiments in Video Retrieval , 2002, TREC.

[125]  Noboru Babaguchi,et al.  Personalized abstraction of broadcasted American football video by highlight selection , 2004, IEEE Transactions on Multimedia.

[126]  Tao Mei,et al.  Multi-Layer Multi-Instance Learning for Video Concept Detection , 2008, IEEE Transactions on Multimedia.

[127]  John R. Smith,et al.  Normalized classifier fusion for semantic visual concept detection , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[128]  Alan F. Smeaton,et al.  A Comparison of Score, Rank and Probability-Based Fusion Methods for Video Shot Retrieval , 2005, CIVR.

[129]  Jianping Fan,et al.  Incorporating Concept Ontology for Hierarchical Video Classification, Annotation, and Visualization , 2007, IEEE Transactions on Multimedia.

[130]  Ellen M. Voorhees,et al.  TREC: Experiment and Evaluation in Information Retrieval (Digital Libraries and Electronic Publishing) , 2005 .

[131]  Martin Szummer,et al.  Indoor-outdoor image classification , 1998, Proceedings 1998 IEEE International Workshop on Content-Based Access of Image and Video Database.

[132]  Yung-Yu Chuang,et al.  Multi-cue fusion for semantic video indexing , 2008, ACM Multimedia.

[133]  Takeo Kanade,et al.  Name-It: Naming and Detecting Faces in News Videos , 1999, IEEE Multim..

[134]  Stéphane Ayache,et al.  Classifier Fusion for SVM-Based Multimedia Semantic Indexing , 2007, ECIR.

[135]  Yukinobu Taniguchi,et al.  Structured Video Computing , 1994, IEEE MultiMedia.

[136]  Michael E. Lesk,et al.  Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone , 1986, SIGDOC '86.

[137]  Alberto Del Bimbo,et al.  Visual information retrieval , 1999 .

[138]  James F. Allen Maintaining knowledge about temporal intervals , 1983, CACM.

[139]  Karen Spärck Jones,et al.  Automatic content-based retrieval of broadcast news , 1995, MULTIMEDIA '95.

[140]  Shih-Fu Chang,et al.  A reranking approach for context-based concept fusion in video indexing and retrieval , 2007, CIVR '07.

[141]  Shih-Fu Chang,et al.  Revision of LSCOM Event/Activity Annotations , 2006 .

[142]  Jitendra Malik,et al.  Blobworld: Image Segmentation Using Expectation-Maximization and Its Application to Image Querying , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[143]  Harry W. Agius,et al.  Video summarisation: A conceptual framework and survey of the state of the art , 2008, J. Vis. Commun. Image Represent..

[144]  Michael G. Strintzis,et al.  Knowledge-assisted semantic video object detection , 2005, IEEE Transactions on Circuits and Systems for Video Technology.

[145]  Bo Zhang,et al.  A Formal Study of Shot Boundary Detection , 2007, IEEE Transactions on Circuits and Systems for Video Technology.

[146]  Sydney S. Weinstein,et al.  Is everything miscellaneous , 1992 .

[147]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[148]  Shih-Fu Chang,et al.  Columbia University’s Baseline Detectors for 374 LSCOM Semantic Visual Concepts , 2007 .

[149]  Marcel Worring,et al.  Semantic Image and Video Indexing in Broad Domains , 2007, IEEE Trans. Multim..

[150]  Dragutin Petkovic,et al.  Query by Image and Video Content: The QBIC System , 1995, Computer.

[151]  Yihong Gong,et al.  Lessons Learned from Building a Terabyte Digital Video Library , 1999, Computer.

[152]  Klara Nahrstedt,et al.  Multimedia: Computing, Communications and Applications , 1994 .

[153]  Jing Zhang,et al.  Framework for Performance Evaluation of Face, Text, and Vehicle Detection and Tracking in Video: Data, Metrics, and Protocol , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[154]  Cor J. Veenman,et al.  Robust Scene Categorization by Learning Image Statistics in Context , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[155]  Yi Wu,et al.  Ontology-based multi-classification learning for video concept detection , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[156]  Shih-Fu Chang,et al.  Pattern Mining in Visual Concept Streams , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[157]  Tobun Dorbin Ng,et al.  Collages as dynamic summaries for news video , 2002, MULTIMEDIA '02.

[158]  Qibin Sun,et al.  Video Browsing on Handheld Devices—Interface Designs for the Next Generation of Mobile Video Players , 2008, IEEE MultiMedia.

[159]  P. Bartlett,et al.  Probabilities for SV Machines , 2000 .

[160]  Meng Wang,et al.  Optimizing multi-graph learning: towards a unified video annotation scheme , 2007, ACM Multimedia.

[161]  Rong Yan,et al.  Cross-domain video concept detection using adaptive svms , 2007, ACM Multimedia.

[162]  Michael S. Lew,et al.  Principles of Visual Information Retrieval , 2001, Advances in Pattern Recognition.

[163]  Changsheng Xu,et al.  A Novel Framework for Semantic Annotation and Personalized Retrieval of Sports Video , 2008, IEEE Transactions on Multimedia.

[164]  Tim O'Reilly,et al.  What is Web 2.0: Design Patterns and Business Models for the Next Generation of Software , 2007 .

[165]  B. S. Manjunath,et al.  Introduction to MPEG-7: Multimedia Content Description Interface , 2002 .

[166]  Rong Yan,et al.  A review of text and image retrieval approaches for broadcast news video , 2007, Information Retrieval.

[167]  Dong Xu,et al.  Columbia University TRECVID-2006 Video Search and High-Level Feature Extraction , 2006, TRECVID.

[168]  James Ze Wang,et al.  SIMPLIcity: Semantics-Sensitive Integrated Matching for Picture LIbraries , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[169]  Eric Bruno,et al.  Design of Multimodal Dissimilarity Spaces for Retrieval of Video Documents , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[170]  Marc Davis,et al.  Editing out Video Editing , 2003, IEEE Multim..

[171]  Lexing Xie,et al.  Event Mining in Multimedia Streams , 2008, Proceedings of the IEEE.

[172]  Timo Ojala,et al.  Cluster-temporal browsing of large news video databases , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[173]  Gertjan J. Burghouts,et al.  Performance evaluation of local colour invariants , 2009, Comput. Vis. Image Underst..

[174]  Marcel Worring,et al.  Query on demand video browsing , 2007, ACM Multimedia.

[175]  Stephen E. Robertson,et al.  Okapi at TREC-3 , 1994, TREC.

[176]  Shih-Fu Chang,et al.  Reranking Methods for Visual Search , 2007, IEEE MultiMedia.

[177]  Paul Over,et al.  Evaluation campaigns and TRECVid , 2006, MIR '06.

[178]  Shih-Fu Chang,et al.  Visual islands: intuitive browsing of visual search results , 2008, CIVR '08.

[179]  James Ze Wang,et al.  Image retrieval: Ideas, influences, and trends of the new age , 2008, CSUR.

[180]  Marcel Worring,et al.  Learning rich semantics from news video archives by style analysis , 2006, TOMCCAP.

[181]  Atreyi Kankanhalli,et al.  Automatic partitioning of full-motion video , 1993, Multimedia Systems.

[182]  Leonidas J. Guibas,et al.  The Earth Mover's Distance as a Metric for Image Retrieval , 2000, International Journal of Computer Vision.

[183]  Paul Over,et al.  The trecvid 2007 BBC rushes summarization evaluation pilot , 2007, TVS '07.

[184]  John R. Smith,et al.  IBM Research TRECVID-2009 Video Retrieval System , 2009, TRECVID.

[185]  John R. Smith,et al.  Semantic Indexing of Multimedia Content Using Visual, Audio, and Text Cues , 2003, EURASIP J. Adv. Signal Process..

[186]  Edward Y. Chang,et al.  Multimodal concept-dependent active learning for image retrieval , 2004, MULTIMEDIA '04.

[187]  H. Bourlard,et al.  Interpretation of Multiparty Meetings the AMI and Amida Projects , 2008, 2008 Hands-Free Speech Communication and Microphone Arrays.

[188]  Cor J. Veenman,et al.  The influence of cross-validation on video classification performance , 2006, MM '06.

[189]  Meng Wang,et al.  Correlative multilabel video annotation with temporal kernels , 2008, TOMCCAP.

[190]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[191]  Rong Yan,et al.  Semantic concept-based query expansion and re-ranking for multimedia retrieval , 2007, ACM Multimedia.

[192]  Jin Zhao,et al.  Video Retrieval Using High Level Features: Exploiting Query Matching and Confidence-Based Weighting , 2006, CIVR.

[193]  Dong Wang,et al.  The importance of query-concept-mapping for automatic video retrieval , 2007, ACM Multimedia.

[194]  Alexander G. Hauptmann,et al.  Successful approaches in the TREC video retrieval evaluations , 2004, MULTIMEDIA '04.

[195]  Azriel Rosenfeld,et al.  Picture Processing by Computer , 1969, CSUR.

[196]  Rong Yan,et al.  IBM multimedia analysis and retrieval system , 2008, CIVR '08.

[197]  Marcel Worring,et al.  The Role of Visual Content and Style for Concert Video Indexing , 2007, 2007 IEEE International Conference on Multimedia and Expo.

[198]  Dong Wang,et al.  Video search in concept subspace: a text-like paradigm , 2007, CIVR '07.

[199]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[200]  Marcel Worring,et al.  Balancing thread based navigation for targeted video search , 2008, CIVR '08.

[201]  Thijs Westerveld,et al.  Multimedia Retrieval Using Multiple Examples , 2004, CIVR.

[202]  Nuno Vasconcelos,et al.  Bridging the Gap: Query by Semantic Example , 2007, IEEE Transactions on Multimedia.

[203]  Tobun Dorbin Ng,et al.  Informedia at TRECVID 2003 : Analyzing and Searching Broadcast News Video , 2003, TRECVID.

[204]  Milind R. Naphade,et al.  Learning the semantics of multimedia queries and concepts from a small number of examples , 2005, MULTIMEDIA '05.

[205]  Alexander G. Hauptmann,et al.  The Use and Utility of High-Level Semantic Features in Video Retrieval , 2005, CIVR.

[206]  Rong Yan,et al.  Extreme video retrieval: joint maximization of human and computer performance , 2006, MM '06.

[207]  Marcel Worring,et al.  Multimedia event-based video indexing using time intervals , 2005, IEEE Transactions on Multimedia.

[208]  John R. Smith,et al.  Multi-granular detection of regional semantic concepts , 2004, ICME.

[209]  B. S. Manjunath,et al.  Texture Features for Browsing and Retrieval of Image Data , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[210]  Noboru Babaguchi,et al.  Event based indexing of broadcasted sports video by intermodal collaboration , 2002, IEEE Trans. Multim..

[211]  Daniel Heesch,et al.  A survey of browsing models for content based image retrieval , 2008, Multimedia Tools and Applications.

[212]  Sargur N. Srihari,et al.  Decision Combination in Multiple Classifier Systems , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[213]  Edward Y. Chang,et al.  Optimal multimodal fusion for multimedia data analysis , 2004, MULTIMEDIA '04.

[214]  Tao Wang,et al.  One step beyond histograms: Image representation using Markov stationary features , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[215]  Wilson S. Geisler,et al.  Multichannel Texture Analysis Using Localized Spatial Filters , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[216]  Alan Hanjalic,et al.  Affective video content representation and modeling , 2005, IEEE Transactions on Multimedia.

[217]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[218]  Apostol Natsev,et al.  Web-based information content and its application to concept-based video retrieval , 2008, CIVR '08.

[219]  Alberto Del Bimbo,et al.  Visual Image Retrieval by Elastic Matching of User Sketches , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[220]  John R. Smith,et al.  A web-based system for collaborative annotation of large image and video collections: an evaluation and user study , 2005, MULTIMEDIA '05.

[221]  Dong Wang,et al.  THU and ICRC at TRECVID 2007 , 2007, TRECVID.

[222]  Rong Yan,et al.  The combination limit in multimedia retrieval , 2003, MULTIMEDIA '03.

[223]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[224]  Milind R. Naphade On supervision and statistical learning for semantic multimedia analysis , 2004, J. Vis. Commun. Image Represent..

[225]  Alan F. Smeaton Techniques used and open challenges to the analysis, indexing and retrieval of digital video , 2007, Inf. Syst..

[226]  Rong Yan,et al.  Can High-Level Concepts Fill the Semantic Gap in Video Retrieval? A Case Study With Broadcast News , 2007, IEEE Transactions on Multimedia.

[227]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[228]  Anil K. Jain,et al.  Statistical Pattern Recognition: A Review , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[229]  Marcel Worring,et al.  Detection of moving objects in video using a robust motion similarity measure , 2000, IEEE Trans. Image Process..

[230]  Milind R. Naphade,et al.  Extracting semantics from audio-visual content: the final frontier in multimedia retrieval , 2002, IEEE Trans. Neural Networks.

[231]  Mor Naaman,et al.  HT06, tagging paper, taxonomy, Flickr, academic article, to read , 2006, HYPERTEXT '06.

[232]  Daniel Lewis,et al.  What is web 2.0? , 2006, CROS.

[233]  Ulrich Eckhardt,et al.  Shape descriptors for non-rigid shapes with a single closed contour , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[234]  John R. Kender,et al.  Visual concepts for news story tracking: analyzing and exploiting the NIST TRESVID video annotation experiment , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[235]  Paul Over,et al.  TRECVID 2006 Overview , 2006, TRECVID.

[236]  Anil K. Jain,et al.  Shape-Based Retrieval: A Case Study With Trademark Image Databases , 1998, Pattern Recognit..

[237]  Shih-Fu Chang,et al.  Automatic discovery of query-class-dependent models for multimodal search , 2005, MULTIMEDIA '05.

[238]  Ming-yu Chen,et al.  Multi-modal classification in digital news libraries , 2004, Proceedings of the 2004 Joint ACM/IEEE Conference on Digital Libraries, 2004..

[239]  Rong Yan,et al.  Negative pseudo-relevance feedback in content-based video retrieval , 2003, MULTIMEDIA '03.

[240]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[241]  Zhu Liu,et al.  Multimedia content analysis-using both audio and visual clues , 2000, IEEE Signal Process. Mag..

[242]  Paul Over,et al.  TRECVID 2008 - Goals, Tasks, Data, Evaluation Mechanisms and Metrics , 2010, TRECVID.

[243]  Ba Tu Truong,et al.  Video abstraction: A systematic review and classification , 2007, TOMCCAP.

[244]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[245]  Peter G. B. Enser,et al.  Visual image retrieval: seeking the alliance of concept-based and content-based paradigms , 2000, J. Inf. Sci..

[246]  Cor J. Veenman,et al.  Kernel Codebooks for Scene Categorization , 2008, ECCV.

[247]  Katja Hofmann,et al.  Assessing concept selection for video retrieval , 2008, MIR '08.

[248]  Ted Pedersen,et al.  Extended Gloss Overlaps as a Measure of Semantic Relatedness , 2003, IJCAI.

[249]  Remco C. Veltkamp,et al.  State of the Art in Shape Matching , 2001, Principles of Visual Information Retrieval.

[250]  Ramesh C. Jain,et al.  Metadata in video databases , 1994, SGMD.

[251]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[252]  François Brémond,et al.  ETISEO, performance evaluation for video surveillance systems , 2007, 2007 IEEE Conference on Advanced Video and Signal Based Surveillance.

[253]  Antonio Torralba,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 80 Million Tiny Images: a Large Dataset for Non-parametric Object and Scene Recognition , 2022 .

[254]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[255]  Bernardo A. Huberman,et al.  The Structure of Collaborative Tagging Systems , 2005, ArXiv.

[256]  Jun Yang,et al.  (Un)Reliability of video concept detection , 2008, CIVR '08.

[257]  Marcel Worring,et al.  A Learned Lexicon-Driven Paradigm for Interactive Video Retrieval , 2007, IEEE Transactions on Multimedia.

[258]  Alex Pentland,et al.  Photobook: Content-based manipulation of image databases , 1996, International Journal of Computer Vision.

[259]  Yongdong Zhang,et al.  Segregated feedback with performance-based adaptive sampling for interactive news video retrieval , 2007, ACM Multimedia.

[260]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[261]  Ramanathan V. Guha,et al.  Building Large Knowledge-Based Systems: Representation and Inference in the Cyc Project , 1990 .

[262]  Edward Y. Chang,et al.  Active Learning for Interactive Multimedia Retrieval , 2008, Proceedings of the IEEE.

[263]  Marcel Worring,et al.  VideOlympics: Real-Time Evaluation of Multimedia Retrieval Systems , 2008, IEEE MultiMedia.

[264]  John R. Smith,et al.  Multimedia semantic indexing using model vectors , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[265]  Jun Yang,et al.  Finding Person X: Correlating Names with Visual Appearances , 2004, CIVR.

[266]  de Franciska Jong,et al.  OLIVE: Speech-Based Video Retrieval , 1998 .

[267]  Alan F. Smeaton,et al.  Validating the Detection of Everyday Concepts in Visual Lifelogs , 2008, SAMT.

[268]  Arnold W. M. Smeulders,et al.  PicToSeek: combining color and shape invariant features for image retrieval , 2000, IEEE Trans. Image Process..

[269]  Ellen M. Voorhees,et al.  Retrieval evaluation with incomplete information , 2004, SIGIR '04.

[270]  Michael G. Christel,et al.  Exploiting multiple modalities for interactive video retrieval , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[271]  Dennis Koelma,et al.  The MediaMill TRECVID 2008 Semantic Video Search Engine , 2008, TRECVID.

[272]  Yiannis Kompatsiaris,et al.  K-Space at TRECvid 2006 , 2006, TRECVID.

[273]  Marcel Worring,et al.  Interactive Search by Direct Manipulation of Dissimilarity Space , 2007, IEEE Transactions on Multimedia.

[274]  Alberto Del Bimbo,et al.  Automatic video annotation using ontologies extended with visual information , 2005, MULTIMEDIA '05.

[275]  Shih-Fu Chang,et al.  Visually Searching the Web for Content , 1997, IEEE Multim..

[276]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[277]  Stefan M. Rüger,et al.  Image Browsing: Semantic Analysis of NN k Networks , 2005, CIVR.

[278]  Krystyna K. Matusiak Towards user-centered indexing in digital image collections , 2006, OCLC Syst. Serv..

[279]  Theo Gevers,et al.  Adaptive Image Segmentation by Combining Photometric Invariant Region and Edge Information , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[280]  Koen E. A. van de Sande,et al.  Evaluation of color descriptors for object and scene recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[281]  Thomas S. Huang,et al.  Factor graph framework for semantic video indexing , 2002, IEEE Trans. Circuits Syst. Video Technol..

[282]  Xian-Sheng Hua,et al.  Video search re-ranking via multi-graph propagation , 2007, ACM Multimedia.

[283]  Luc Van Gool,et al.  Moment invariants for recognition under changing viewpoint and illumination , 2004, Comput. Vis. Image Underst..

[284]  Sheng Tang,et al.  TRECVID 2007 High-Level Feature Extraction By MCG-ICT-CAS , 2007, TRECVID.

[285]  Wei Dai,et al.  Joint categorization of queries and clips for web-based video search , 2006, MIR '06.

[286]  Frédéric Jurie,et al.  Randomized Clustering Forests for Image Classification , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[287]  Shih-Fu Chang,et al.  CuZero: embracing the frontier of interactive visual search for informed users , 2008, MIR '08.

[288]  Arnold W. M. Smeulders,et al.  c ○ 2005 Springer Science + Business Media, Inc. Manufactured in The Netherlands. A Six-Stimulus Theory for Stochastic Texture , 2002 .

[289]  Andrew Zisserman,et al.  Efficient Visual Search for Objects in Videos , 2008, Proceedings of the IEEE.

[290]  John R. Kender,et al.  VAST MM: multimedia browser for presentation video , 2007, CIVR '07.

[291]  Peter L. Stanchev,et al.  Multimedia Retrieval , 2007, Data-Centric Systems and Applications.

[292]  Nicu Sebe,et al.  Texture Features for Content-Based Retrieval , 2001, Principles of Visual Information Retrieval.

[293]  Chong-Wah Ngo,et al.  Columbia University/VIREO-CityU/IRIT TRECVID2008 High-Level Feature Extraction and Interactive Video Search , 2008, TRECVID.

[294]  Shih-Fu Chang,et al.  Cross-domain learning methods for high-level visual concept classification , 2008, 2008 15th IEEE International Conference on Image Processing.

[295]  Charles A. Bouman,et al.  ViBE: a compressed video database structured for active browsing and search , 2004, IEEE Transactions on Multimedia.

[296]  Marcel Worring,et al.  The Semantic Pathfinder: Using an Authoring Metaphor for Generic Multimedia Indexing , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[297]  Stéphane Ayache,et al.  Evaluation of active learning strategies for video indexing , 2007, Signal Process. Image Commun..