Movie Keyframe Retrieval Based on Cross-Media Correlation Detection and Context Model

In this paper, we propose a novel cross-media correlation detection method for movie keyframe retrieval. We first compute the temporal saliency on both the video and audio streams in a movie separately, then locate the resonance regions that the saliency changes in these two modalities show strong correlations. Next, starting from resonance regions, we propagate the similarity of visual and auditory characteristics through neighboring movie regions based on a temporal movie context model, segmenting the movie into a sequence of coherent parts from which keyframes are extracted. The experimental results on actual movie clips show that, compared to the single-modality algorithms, our method gives improved retrieval performance in completeness and precision due to the efficient exploitation of the context and correlations between complementary multi-modalities.

[1]  Jiang Peng,et al.  Keyframe-Based Video Summary Using Visual Attention Clues , 2010 .

[2]  Yueting Zhuang,et al.  Cross-modal correlation learning for clustering on image-audio dataset , 2007, ACM Multimedia.

[3]  Wei-Ying Ma,et al.  Multi-model similarity propagation and its application for web image retrieval , 2004, MULTIMEDIA '04.

[4]  Ioannis Pitas,et al.  Enhanced Eigen-Audioframes for Audiovisual Scene Change Detection , 2007, IEEE Transactions on Multimedia.

[5]  Nebojsa Jojic,et al.  Audio-Video Sensor Fusion with Probabilistic Graphical Models , 2002, ECCV.

[6]  David A. Forsyth,et al.  Matching Words and Pictures , 2003, J. Mach. Learn. Res..

[7]  Edward Y. Chang,et al.  CBSA: content-based soft annotation for multimodal image retrieval using Bayes point machines , 2003, IEEE Trans. Circuits Syst. Video Technol..

[8]  Michael I. Jordan,et al.  Modeling annotated data , 2003, SIGIR.

[9]  James Ze Wang,et al.  Content-based image retrieval: approaches and trends of the new age , 2005, MIR '05.

[10]  Hongjun Lu,et al.  ReCoM: reinforcement clustering of multi-type interrelated data objects , 2003, SIGIR.

[11]  Wei-Ying Ma,et al.  Video summarization based on user log enhanced link analysis , 2003, ACM Multimedia.

[12]  A. Sutera,et al.  The mechanism of stochastic resonance , 1981 .

[13]  Serge J. Belongie,et al.  Object categorization using co-occurrence, location and appearance , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  R. Manmatha,et al.  Multiple Bernoulli relevance models for image and video annotation , 2004, CVPR 2004.