Latent semantic indexing for semantic content detection of video shots

Low-level features are now becoming insufficient to build efficient content-based retrieval systems. The interest of users is not any more to retrieve visually similar content, but they expect retrieval systems to find documents with similar semantic content. Bridging the gap between low-level features and semantic content is a challenging task necessary for future retrieval systems. Latent semantic indexing (LSI) was successfully introduced to efficiently index text documents. We propose to adapt this technique to efficiently represent the visual content of video shots for semantic content detection. Although we restrict our approach to visual features, it can be extended with minor changes to audio and motion features to build a multi-modal system. The semantic content is then detected thanks to two classifiers: k-nearest neighbors and neural network classifiers. Finally, in the experimental section we show the performances of each classifier and the performance gain obtained with LSI features compared to traditional features.

[1]  Peter W. Foltz,et al.  An introduction to latent semantic analysis , 1998 .

[2]  Shih-Fu Chang,et al.  A fully automated content-based video search engine supporting spatiotemporal queries , 1998, IEEE Trans. Circuits Syst. Video Technol..

[3]  David A. Forsyth,et al.  Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary , 2002, ECCV.

[4]  William I. Grosky,et al.  From features to semantics: some preliminary results , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[5]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[6]  Mikko Kurimo Indexing Audio Documents by using Latent Semantic Analysis and SOM , 1999 .

[7]  Wei-Ying Ma,et al.  Benchmarking of image features for content-based retrieval , 1998, Conference Record of Thirty-Second Asilomar Conference on Signals, Systems and Computers (Cat. No.98CH36284).

[8]  Fabrice Souvannavong,et al.  Video content modeling with latent semantic analysis , 2003 .

[9]  Ching-Yung Lin,et al.  Video Collaborative Annotation Forum: Establishing Ground-Truth Labels on Large Multimedia Datasets , 2003, TRECVID.

[10]  Brendan J. Frey,et al.  Probabilistic multimedia objects (multijects): a novel approach to video indexing and retrieval in multimedia systems , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).