Audio-visual query and retrieval: A system that uses dynamic programming and relevance feedback

A necessary capability for content-based retrieval is to support the paradigm of query by example. In the past, there have been several attempts to use low-level features for video retrieval. However, most of the approaches support queries using image se- quences only. We present an algorithm for matching multimodal (audio-visual) patterns for the purpose of content-based video re- trieval. The novel ability of our approach to use the information con- tent in multiple media coupled with a strong emphasis on temporal similarity differentiates it from the state of the art in content-based retrieval. At the core of the pattern matching scheme is a dynamic programming algorithm, which leads to a significant improvement in performance. Coupling the use of audio with video this algorithm can be applied to grouping of shots based on audio-visual similarity. We also support relevance feedback. The user can provide feed- back to the system, by choosing clips, which are closer to the user's desired target. The system then automatically adjusts the relative weights or relevance of the media and fetches different sets of tar- get clips accordingly. It is our observation that a few iterations of such feedback are generally sufficient, for retrieving the desired video clips. © 2001 SPIE and IS&T. (DOI: 10.1117/1.1407822)

[1]  Milind R. Naphade,et al.  A probabilistic framework for semantic video indexing, filtering, and retrieval , 2001, IEEE Trans. Multim..

[2]  R. Bellman Dynamic programming. , 1957, Science.

[3]  Thomas S. Huang,et al.  Relevance feedback: a power tool for interactive content-based image retrieval , 1998, IEEE Trans. Circuits Syst. Video Technol..

[4]  Amarnath Gupta,et al.  Virage image search engine: an open framework for image management , 1996, Electronic Imaging.

[5]  Anil K. Jain,et al.  Shape-Based Retrieval: A Case Study With Trademark Image Databases , 1998, Pattern Recognit..

[6]  Rakesh Mohan,et al.  Video sequence matching , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[7]  Nuno Vasconcelos,et al.  Bayesian modeling of video editing and structure: semantic features for video summarization and browsing , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[8]  B. S. Manjunath,et al.  NeTra: A toolbox for navigating large image databases , 1997, Proceedings of International Conference on Image Processing.

[9]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[10]  Milind R. Naphade,et al.  Probabilistic Semantic Video Indexing , 2000, NIPS.

[11]  Brendan J. Frey,et al.  Probabilistic multimedia objects (multijects): a novel approach to video indexing and retrieval in multimedia systems , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[12]  Wei Xiong,et al.  Query by video clip , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[13]  Minerva M. Yeung,et al.  Efficient matching and clustering of video shots , 1995, Proceedings., International Conference on Image Processing.

[14]  Sanjeev R. Kulkarni,et al.  Automated analysis and annotation of basketball video , 1997, Electronic Imaging.

[15]  Shih-Fu Chang,et al.  VisualSEEk: a fully automated content-based image query system , 1997, MULTIMEDIA '96.

[16]  Shih-Fu Chang,et al.  Spatio-temporal video search using the object based video representation , 1997, Proceedings of International Conference on Image Processing.

[17]  K. Ramchandran,et al.  A factor graph framework for semantic indexing and retrieval in video , 2000, 2000 Proceedings Workshop on Content-based Access of Image and Video Libraries.

[18]  B. S. Manjunath,et al.  NeTra: A toolbox for navigating large image databases , 1997, Multimedia Systems.

[19]  A. Murat Tekalp,et al.  A high-performance shot boundary detection algorithm using multiple cues , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[20]  Milind R. Naphade,et al.  A probabilistic framework for semantic indexing and retrieval in video , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[21]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[22]  Dragutin Petkovic,et al.  Query by Image and Video Content: The QBIC System , 1995, Computer.

[23]  Milind R. Naphade,et al.  Novel scheme for fast and efficent video sequence matching using compact signatures , 1999, Electronic Imaging.

[24]  Yücel Altunbasak,et al.  Content-based video retrieval and compression: a unified solution , 1997, Proceedings of International Conference on Image Processing.

[25]  C.-C. Jay Kuo,et al.  Integrated approach to multimodal media content analysis , 1999, Electronic Imaging.

[26]  David S. Doermann,et al.  Identifying sports videos using replay, text, and camera motion features , 1999, Electronic Imaging.