A Learned Lexicon-Driven Paradigm for Interactive Video Retrieval

Effective video retrieval is the result of interplay between interactive query selection, advanced visualization of results, and a goal-oriented human user. Traditional interactive video retrieval approaches emphasize paradigms, such as query-by-keyword and query-by-example, to aid the user in the search for relevant footage. However, recent results in automatic indexing indicate that query-by-concept is becoming a viable resource for interactive retrieval also. We propose in this paper a new video retrieval paradigm. The core of the paradigm is formed by first detecting a large lexicon of semantic concepts. From there, we combine query-by-concept, query-by-example, query-by-keyword, and user interaction into the MediaMill semantic video search engine. To measure the impact of increasing lexicon size on interactive video retrieval performance, we performed two experiments against the 2004 and 2005 NIST TRECVID benchmarks, using lexicons containing 32 and 101 concepts, respectively. The results suggest that from all factors that play a role in interactive retrieval, a large lexicon of semantic concepts matters most. Indeed, by exploiting large lexicons, many video search questions are solvable without using query-by-keyword and query-by-example. In addition, we show that the lexicon-driven search engine outperforms all state-of-the-art video retrieval systems in both TRECVID 2004 and 2005

[1]  Alan F. Smeaton,et al.  Large Scale Evaluations of Multimedia Information Retrieval: The TRECVid Experience , 2005, CIVR.

[2]  Daniel Marcu,et al.  Machine translation in the year 2004 , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[3]  Sargur N. Srihari,et al.  Decision Combination in Multiple Classifier Systems , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Jianping Fan,et al.  Concept-oriented indexing of video databases: toward semantic sensitive retrieval and browsing , 2004, IEEE Transactions on Image Processing.

[5]  B. S. Manjunath,et al.  NeTra: A toolbox for navigating large image databases , 1997, Multimedia Systems.

[6]  Jong-Hak Lee,et al.  Analyses of multiple evidence combination , 1997, SIGIR '97.

[7]  Alberto Del Bimbo,et al.  Visual information retrieval , 1999 .

[8]  Arnold W. M. Smeulders,et al.  Color Invariance , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Toshikazu Kato,et al.  A sketch retrieval method for full color image database-query by visual example , 1992, [1992] Proceedings. 11th IAPR International Conference on Pattern Recognition.

[10]  Marcel Worring,et al.  Learning rich semantics from news video archives by style analysis , 2006, TOMCCAP.

[11]  Milind R. Naphade,et al.  A probabilistic framework for semantic video indexing, filtering, and retrieval , 2001, IEEE Trans. Multim..

[12]  Wei-Hao Lin,et al.  Assessing Effectiveness in Video Retrieval , 2005, CIVR.

[13]  Georges Quénot,et al.  CLIPS at TREC 11: Experiments in Video Retrieval , 2002, TREC.

[14]  G. Medioni,et al.  Content-based image retrieval: an overview , 2004 .

[15]  Shih-Fu Chang,et al.  Visually Searching the Web for Content , 1997, IEEE Multim..

[16]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[17]  Alan F. Smeaton,et al.  Designing the User Interface for the Físchlár Digital Video Library , 2006, J. Digit. Inf..

[18]  John Adcock,et al.  Interactive Video Search Using Multilevel Indexing , 2005, CIVR.

[19]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[20]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[21]  Anil K. Jain,et al.  Image classification for content-based indexing , 2001, IEEE Trans. Image Process..

[22]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[23]  Thomas S. Huang,et al.  Relevance feedback: a power tool for interactive content-based image retrieval , 1998, IEEE Trans. Circuits Syst. Video Technol..

[24]  Shih-Fu Chang,et al.  A fully automated content-based video search engine supporting spatiotemporal queries , 1998, IEEE Trans. Circuits Syst. Video Technol..

[25]  Jean-Luc Gauvain,et al.  The LIMSI Broadcast News transcription system , 2002, Speech Commun..

[26]  Charles A. Bouman,et al.  ViBE: a compressed video database structured for active browsing and search , 2004, IEEE Transactions on Multimedia.

[27]  Marcel Worring,et al.  The Semantic Pathfinder: Using an Authoring Metaphor for Generic Multimedia Indexing , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Jun Yang,et al.  CMU Informedia's TRECVID 2005 Skirmishes , 2005, TRECVID.

[29]  Tobun Dorbin Ng,et al.  Informedia at TRECVID 2003 : Analyzing and Searching Broadcast News Video , 2003, TRECVID.

[30]  Jianping Fan,et al.  ClassView: hierarchical video shot classification, indexing, and accessing , 2004, IEEE Transactions on Multimedia.

[31]  Dragutin Petkovic,et al.  Query by Image and Video Content: The QBIC System , 1995, Computer.

[32]  Alexander G. Hauptmann,et al.  Towards a Large Scale Concept Ontology for Broadcast Video , 2004, CIVR.

[33]  P. Bartlett,et al.  Probabilities for SV Machines , 2000 .

[34]  Paul Over,et al.  TRECVID: evaluating the effectiveness of information retrieval tasks on digital video , 2004, MULTIMEDIA '04.

[35]  John R. Smith,et al.  IBM Research TRECVID-2009 Video Retrieval System , 2009, TRECVID.

[36]  John R. Smith,et al.  Semantic Indexing of Multimedia Content Using Visual, Audio, and Text Cues , 2003, EURASIP J. Adv. Signal Process..

[37]  Jitendra Malik,et al.  Blobworld: Image Segmentation Using Expectation-Maximization and Its Application to Image Querying , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[38]  Yihong Gong,et al.  Lessons Learned from Building a Terabyte Digital Video Library , 1999, Computer.

[39]  Vladimir Cherkassky,et al.  The Nature Of Statistical Learning Theory , 1997, IEEE Trans. Neural Networks.

[40]  Alberto Del Bimbo,et al.  Visual Image Retrieval by Elastic Matching of User Sketches , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[41]  Alex Pentland,et al.  Photobook: Content-based manipulation of image databases , 1996, International Journal of Computer Vision.

[42]  Martin Szummer,et al.  Indoor-outdoor image classification , 1998, Proceedings 1998 IEEE International Workshop on Content-Based Access of Image and Video Database.

[43]  Christian Petersohn Fraunhofer HHI at TRECVID 2004: Shot Boundary Detection System , 2004, TRECVID.

[44]  Shih-Fu Chang,et al.  MediaNet: a multimedia information network for knowledge representation , 2000, SPIE Optics East.

[45]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[46]  Arnold W. M. Smeulders,et al.  PicToSeek: combining color and shape invariant features for image retrieval , 2000, IEEE Trans. Image Process..

[47]  Michael G. Christel,et al.  Exploiting multiple modalities for interactive video retrieval , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[48]  Dennis Koelma,et al.  The MediaMill TRECVID 2008 Semantic Video Search Engine , 2008, TRECVID.