A review of text and image retrieval approaches for broadcast news video

The effectiveness of a video retrieval system largely depends on the choice of underlying text and image retrieval components. The unique properties of video collections (e.g., multiple sources, noisy features and temporal relations) suggest we examine the performance of these retrieval methods in such a multimodal environment, and identify the relative importance of the underlying retrieval components. In this paper, we review a variety of text/image retrieval approaches as well as their individual components in the context of broadcast news video. Numerous components of text/image retrieval have been discussed in detail, including retrieval models, text sources, temporal expansion methods, query expansion methods, image features, and similarity measures. For each component, we conduct a series of retrieval experiments on TRECVID video collections to identify their advantages and disadvantages. To provide a more complete coverage of video retrieval, we briefly discuss an emerging approach called concept-based video retrieval, and review strategies for combining multiple retrieval outputs.

[1]  Mei-Yuh Hwang,et al.  The SPHINX-II speech recognition system: an overview , 1993, Comput. Speech Lang..

[2]  Beatrice Gralton,et al.  Washington DC - USA , 2008 .

[3]  Akio Nagasaka,et al.  Automatic Video Indexing and Full-Video Search for Object Appearances , 1991, VDB.

[4]  Stephen E. Robertson,et al.  Relevance weighting of search terms , 1976, J. Am. Soc. Inf. Sci..

[5]  Apostol Natsev,et al.  Exploring Automatic Query Refinement for Text-Based Video Retrieval , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[6]  John R. Smith,et al.  IBM Research TRECVID-2009 Video Retrieval System , 2009, TRECVID.

[7]  Jake K. Aggarwal,et al.  Image segmentation by conventional and information-integrating techniques: a synopsis , 1985, Image Vis. Comput..

[8]  Cordelia Schmid,et al.  Local Grayvalue Invariants for Image Retrieval , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  John R. Smith,et al.  VideoAnnEx: IBM MPEG-7 Annotation Tool for Multimedia Indexing and Concept Learning , 2003 .

[10]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[11]  Joan Serra,et al.  Image segmentation , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[12]  Beng Chin Ooi,et al.  Efficient Image Retrieval By Color Contents , 1994, ADB.

[13]  Alan F. Smeaton,et al.  TRECVID 2004 Experiments in Dublin City University , 2004, TRECVID.

[14]  Edward Y. Chang,et al.  Optimal multimodal fusion for multimedia data analysis , 2004, MULTIMEDIA '04.

[15]  Arnold W. M. Smeulders,et al.  Content-Based Image Retrieval , 2004 .

[16]  Thomas S. Huang,et al.  Content-based image retrieval with relevance feedback in MARS , 1997, Proceedings of International Conference on Image Processing.

[17]  Dan I. Moldovan,et al.  LCC at TRECVID 2005 , 2005, TRECVID.

[18]  Wessel Kraaij,et al.  Variations on language modeling for information retrieval , 2005, SIGF.

[19]  Chong-Wah Ngo,et al.  On clustering and retrieval of video shots , 2001, MULTIMEDIA '01.

[20]  John Adcock,et al.  FXPAL Experiments for TRECVID 2004 , 2004, TRECVID.

[21]  Garrison W. Cottrell,et al.  Fusion Via a Linear Combination of Scores , 1999, Information Retrieval.

[22]  Ramin Zabih,et al.  Comparing images using joint histograms , 1999, Multimedia Systems.

[23]  Jing Huang,et al.  Image indexing using color correlograms , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[24]  Javed A. Aslam,et al.  Models for metasearch , 2001, SIGIR '01.

[25]  Rong Yan,et al.  Learning query-class dependent weights in automatic video retrieval , 2004, MULTIMEDIA '04.

[26]  Richard C. Dubes,et al.  Performance evaluation for four classes of textural features , 1992, Pattern Recognit..

[27]  Marcel Worring,et al.  The MediaMill TRECVID 2004 Semantic Viedo Search Engine , 2004, TRECVID.

[28]  Stephen E. Robertson,et al.  Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval , 1994, SIGIR '94.

[29]  Pinar Duygulu Sahin,et al.  Joint visual-text modeling for automatic retrieval of multimedia documents , 2005, ACM Multimedia.

[30]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[31]  David B. Cooper,et al.  Object signature curve and invariant shape patches for geometric indexing into pictorial databases , 1997, Other Conferences.

[32]  K. Wakimoto,et al.  Efficient and Effective Querying by Image Content , 1994 .

[33]  Wei-Hao Lin,et al.  Confounded Expectations: Informedia at TRECVID 2004 , 2004, TRECVID.

[34]  Jingrui He,et al.  Manifold-ranking based image retrieval , 2004, MULTIMEDIA '04.

[35]  Luc Van Gool,et al.  Content-Based Image Retrieval Based on Local Affinely Invariant Regions , 1999, VISUAL.

[36]  Walt Detmar Meurers,et al.  Encyclopedia of Language and Linguistics , 2006 .

[37]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[38]  Ellen K. Hughes,et al.  Video OCR for Digital News Archives , 1998 .

[39]  Thijs Westerveld,et al.  Using generative probabilistic models for multimedia retrieval , 2005, SIGF.

[40]  Tat-Seng Chua,et al.  TRECVID 2005 by NUS PRIS , 2005, TRECVID.

[41]  Shih-Fu Chang,et al.  Combining text and audio-visual features in video indexing , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[42]  S LewMichael,et al.  Content-based multimedia information retrieval , 2006 .

[43]  B. Huurnink Autoseek towards a Fully Automated Video Search System Acknowledgements , 2005 .

[44]  Kim L. Boyer,et al.  Quantitative Measures of Change Based on Feature Organization: Eigenvalues and Eigenvectors , 1998, Comput. Vis. Image Underst..

[45]  Gerald Salton,et al.  Automatic text processing , 1988 .

[46]  Ellen K. Hughes,et al.  Video OCR for digital news archive , 1998, Proceedings 1998 IEEE International Workshop on Content-Based Access of Image and Video Database.

[47]  Tai Sing Lee,et al.  Image Representation Using 2D Gabor Wavelets , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[48]  R. DeMori,et al.  Handbook of pattern recognition and image processing , 1986 .

[49]  B. S. Manjunath,et al.  A comparison of wavelet transform features for texture image annotation , 1995, Proceedings., International Conference on Image Processing.

[50]  Shih-Fu Chang,et al.  VisualSEEk: a fully automated content-based image query system , 1997, MULTIMEDIA '96.

[51]  Tobun Dorbin Ng,et al.  Video retrieval using speech and image information , 2003, IS&T/SPIE Electronic Imaging.

[52]  Jitendra Malik,et al.  Motion segmentation and tracking using normalized cuts , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[53]  Brendan J. Frey,et al.  Probabilistic multimedia objects (multijects): a novel approach to video indexing and retrieval in multimedia systems , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[54]  ChengXiang Zhai,et al.  Probabilistic Relevance Models Based on Document and Query Generation , 2003 .

[55]  Stan Z. Li,et al.  Extraction of feature subspaces for content-based retrieval using relevance feedback , 2001, MULTIMEDIA '01.

[56]  Paul Over,et al.  TRECVID: Benchmarking the Effectivenss of Information Retrieval Tasks on Digital Video , 2003, CIVR.

[57]  Alan F. Smeaton,et al.  Design, implementation and testing of an interactive video retrieval system , 2003, MIR '03.

[58]  R. Manmatha,et al.  Automatic image annotation and retrieval using cross-media relevance models , 2003, SIGIR.

[59]  James Ze Wang,et al.  IRM: integrated region matching for image retrieval , 2000, ACM Multimedia.

[60]  David A. Forsyth,et al.  Matching Words and Pictures , 2003, J. Mach. Learn. Res..

[61]  Alan F. Smeaton,et al.  TRECVid 2006 Experiments at Dublin City University , 2012, TRECVID.

[62]  Edward A. Fox,et al.  Combination of Multiple Searches , 1993, TREC.

[63]  Jun Yang,et al.  Finding Person X: Correlating Names with Visual Appearances , 2004, CIVR.

[64]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[65]  Alexander G. Hauptmann,et al.  Successful approaches in the TREC video retrieval evaluations , 2004, MULTIMEDIA '04.

[66]  Thijs Westerveld,et al.  Multimedia Retrieval Using Multiple Examples , 2004, CIVR.

[67]  Jun Yang,et al.  CMU Informedia's TRECVID 2005 Skirmishes , 2005, TRECVID.

[68]  Xiaochun Cao,et al.  Video Understanding and Content-Based Retrieval , 2005, TRECVID.

[69]  Tobun Dorbin Ng,et al.  Informedia at TRECVID 2003 : Analyzing and Searching Broadcast News Video , 2003, TRECVID.

[70]  Ralph Roskies,et al.  Fourier Descriptors for Plane Closed Curves , 1972, IEEE Transactions on Computers.

[71]  Rainer Lienhart,et al.  VIDEO OCR: A SURVEY AND PRACTITIONER'S GUIDE , 2003 .

[72]  Wei-Ying Ma,et al.  Learning and inferring a semantic space from user's relevance feedback for image retrieval , 2002, MULTIMEDIA '02.

[73]  D Marr,et al.  Theory of edge detection , 1979, Proceedings of the Royal Society of London. Series B. Biological Sciences.

[74]  Joachim M. Buhmann,et al.  Non-parametric similarity measures for unsupervised texture segmentation and image retrieval , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[75]  Alberto Del Bimbo,et al.  Visual information retrieval , 1999 .

[76]  Markus A. Stricker Bounds for the discrimination power of color indexing techniques , 1994, Electronic Imaging.

[77]  Chris Buckley,et al.  Pivoted Document Length Normalization , 1996, SIGIR Forum.

[78]  Marcel Worring,et al.  Multimodal Video Indexing : A Review of the State-ofthe-art , 2001 .

[79]  Jukka Kortelainen,et al.  TRECVID 2004 Experiments at MediaTeam Oulu , 2004, TRECVID.

[80]  Nicu Sebe,et al.  Challenges of Image and Video Retrieval , 2002, CIVR.

[81]  U. M. Feyyad Data mining and knowledge discovery: making sense out of data , 1996 .

[82]  John R. Smith,et al.  On the detection of semantic concepts at TRECVID , 2004, MULTIMEDIA '04.

[83]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[84]  Jean-Luc Gauvain,et al.  The LIMSI Broadcast News transcription system , 2002, Speech Commun..

[85]  T. John Stonham,et al.  Content-based image retrieval using color tuple histograms , 1996, Electronic Imaging.

[86]  Stephen E. Robertson,et al.  Okapi at TREC-3 , 1994, TREC.

[87]  Paul Over,et al.  Evaluation campaigns and TRECVid , 2006, MIR '06.

[88]  Grace Hui Yang,et al.  VideoQA: question answering on news video , 2003, MULTIMEDIA '03.

[89]  Songde Ma,et al.  On the relation between region and contour representation , 1994, Proceedings of 12th International Conference on Pattern Recognition.

[90]  Rong Yan,et al.  Probabilistic latent query analysis for combining multiple retrieval sources , 2006, SIGIR.

[91]  James W. Modestino,et al.  A Maximum Likelihood Approach to Texture Classification , 1982, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[92]  Shih-Fu Chang,et al.  Automatic discovery of query-class-dependent models for multimodal search , 2005, MULTIMEDIA '05.

[93]  Ramesh C. Jain,et al.  A survey on the use of pattern recognition methods for abstraction, indexing and retrieval of images and video , 2002, Pattern Recognit..

[94]  Ryen W. White,et al.  An implicit feedback approach for interactive information retrieval , 2006, Inf. Process. Manag..

[95]  C.-C. Jay Kuo,et al.  Wavelet descriptor of planar curves: theory and applications , 1996, IEEE Trans. Image Process..

[96]  S. Robertson The probability ranking principle in IR , 1997 .

[97]  K. S. Thyagarajan,et al.  A maximum likelihood approach to texture classification using wavelet transform , 1994, Proceedings of 1st International Conference on Image Processing.

[98]  Shih-Fu Chang,et al.  Automated binary texture feature sets for image retrieval , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[99]  Alan F. Smeaton,et al.  A Comparison of Score, Rank and Probability-Based Fusion Methods for Video Shot Retrieval , 2005, CIVR.

[100]  Markus A. Stricker,et al.  Similarity of color images , 1995, Electronic Imaging.

[101]  Martin Szummer,et al.  Indoor-outdoor image classification , 1998, Proceedings 1998 IEEE International Workshop on Content-Based Access of Image and Video Database.

[102]  Alexander G. Hauptmann Spoken Document Retrieval, Automatic , 2006 .

[103]  Dong Xu,et al.  Columbia University TRECVID-2006 Video Search and High-Level Feature Extraction , 2006, TRECVID.

[104]  John D. Lafferty,et al.  A study of smoothing methods for language models applied to Ad Hoc information retrieval , 2001, SIGIR '01.

[105]  Shih-Fu Chang,et al.  Tools and techniques for color image retrieval , 1996, Electronic Imaging.

[106]  Timo Ojala,et al.  Cluster-temporal browsing of large news video databases , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[107]  Michael J. Swain,et al.  Color indexing , 1991, International Journal of Computer Vision.

[108]  Rong Yan,et al.  Probabilistic models for combining diverse knowledge sources in multimedia retrieval , 2006 .

[109]  W. Bruce Croft,et al.  Improving the effectiveness of information retrieval with local context analysis , 2000, TOIS.

[110]  B. Reljin,et al.  Adaptive Content-Based Image Retrieval with Relevance Feedback , 2005, EUROCON 2005 - The International Conference on "Computer as a Tool".

[111]  Beng Chin Ooi,et al.  Fast signature-based color-spatial image retrieval , 1997, Proceedings of IEEE International Conference on Multimedia Computing and Systems.

[112]  Xuequn Li,et al.  Content-Based Image Retrieval by Relevance Feedback , 2000, VISUAL.

[113]  W. Bruce Croft,et al.  Inference networks for document retrieval , 1989, SIGIR '90.

[114]  Jimmie Gilbert,et al.  Eigenvalues and Eigenvectors , 1995 .

[115]  Jin Zhao,et al.  Video Retrieval Using High Level Features: Exploiting Query Matching and Confidence-Based Weighting , 2006, CIVR.

[116]  Chris Buckley,et al.  SMART in TREC 8 , 1999, Text Retrieval Conference.

[117]  Alexander G. Hauptmann,et al.  The Use and Utility of High-Level Semantic Features in Video Retrieval , 2005, CIVR.

[118]  Xian-Sheng Hua,et al.  Automatic location of text in video frames , 2001, MULTIMEDIA '01.

[119]  B. S. Manjunath,et al.  Texture Features for Browsing and Retrieval of Image Data , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[120]  Michael G. Christel,et al.  Information Visualization Within a Digital Video Library , 1998, Journal of Intelligent Information Systems.

[121]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[122]  John R. Smith,et al.  Active selection for multi-example querying by content , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[123]  Nicu Sebe,et al.  Content-based multimedia information retrieval: State of the art and challenges , 2006, TOMCCAP.

[124]  Rong Jin,et al.  Using a probabilistic source model for comparing images , 2002, Proceedings. International Conference on Image Processing.

[125]  W. Bruce Croft,et al.  An Association Thesaurus for Information Retrieval , 1994, RIAO.

[126]  Ramesh C. Jain,et al.  ACM SIGMM retreat report on future directions in multimedia research , 2005, TOMCCAP.

[127]  King-Sun Fu,et al.  Handbook of pattern recognition and image processing , 1986 .

[128]  Pietro Perona,et al.  A Factorization Approach to Grouping , 1998, ECCV.

[129]  Ingemar J. Cox,et al.  "Ratio regions": a technique for image segmentation , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[130]  M. Drenth San Juan, Puerto Rico , 2001 .

[131]  Peter J. Rousseeuw,et al.  Robust regression and outlier detection , 1987 .

[132]  Thomas S. Huang,et al.  Modified Fourier Descriptors for Shape Representation - A Practical Approach , 1996 .

[133]  Jean-Marc Odobez,et al.  Video text recognition using sequential Monte Carlo and error voting methods , 2005, Pattern Recognit. Lett..

[134]  Yihong Gong,et al.  Lessons Learned from Building a Terabyte Digital Video Library , 1999, Computer.

[135]  Hong Jiang Zhang,et al.  Development of A Video Database System , 1994, SIGO.

[136]  Rl Sutton-Spence Encyclopedia of Language and Linguistics 2nd Edition , 2006 .

[137]  Kim L. Boyer,et al.  Quantitative measures of change based on feature organization: eigenvalues and eigenvectors , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[138]  Serge J. Belongie,et al.  Region-based image querying , 1997, 1997 Proceedings IEEE Workshop on Content-Based Access of Image and Video Libraries.

[139]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[140]  Gang Wang,et al.  TRECVID 2004 Search and Feature Extraction Task by NUS PRIS , 2004, TRECVID.

[141]  Mohan S. Kankanhalli,et al.  Shape Measures for Content Based Image Retrieval: A Comparison , 1997, Inf. Process. Manag..