A Two-View Concept Correlation Based Video Annotation Refinement

Recently, concept correlation defining the relationship between concepts has been playing an important role in video annotation (or concept detection). To improve the annotation performance, this paper presents a two-view concept correlation based video annotation refinement, using data-specific spatial and temporal concept correlations. Specifically, instead of generic concept correlation within shots, the spatial view estimates a data-specific concept correlation for each shot, via introducing concept correlation bases to map low-level features to high-level concept distribution under the framework of sparse representation. On the other hand, beyond the temporal consistency of one concept, a richer temporal correlation between different concepts respectively locating in the current shot and its neighbors is utilized to adjust the detection scores. In the end, these two types of concept correlations are integrated into a probability calculation based framework to refine the initial results derived from multiple concept detectors. And the experiments conducted on TRECVID 2006-2008 datasets and comparison with existing works demonstrate its effectiveness.

[1]  Meng Wang,et al.  Correlative multilabel video annotation with temporal kernels , 2008, TOMCCAP.

[2]  Chong-Wah Ngo,et al.  Representations of Keypoint-Based Semantic Concept Detection: A Comprehensive Study , 2010, IEEE Transactions on Multimedia.

[3]  Xiangyang Xue,et al.  Semantic video indexing by fusing explicit and implicit context spaces , 2010, ACM Multimedia.

[4]  Jing Liu,et al.  Image annotation via graph learning , 2009, Pattern Recognit..

[5]  Yuxin Peng,et al.  Mining concept relationship in temporal context for effective video annotation , 2011, MM '11.

[6]  Allen Y. Yang,et al.  Robust Face Recognition via Sparse Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Antonio Torralba,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 80 Million Tiny Images: a Large Dataset for Non-parametric Object and Scene Recognition , 2022 .

[8]  Chong-Wah Ngo,et al.  Exploring inter-concept relationship with context space for semantic video indexing , 2009, CIVR '09.

[9]  Ming-Syan Chen,et al.  Association and Temporal Rule Mining for Post-Filtering of Semantic Concept Detection in Video , 2008, IEEE Transactions on Multimedia.

[10]  Shih-Fu Chang,et al.  Context-Based Concept Fusion with Boosted Conditional Random Fields , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[11]  Guillermo Sapiro,et al.  Online dictionary learning for sparse coding , 2009, ICML '09.

[12]  Yung-Yu Chuang,et al.  Multi-cue fusion for semantic video indexing , 2008, ACM Multimedia.

[13]  Shih-Fu Chang,et al.  CU-VIREO 374 : Fusing Columbia 374 and VIREO 374 for Large Scale Semantic Concept Detection , 2008 .

[14]  Shih-Fu Chang,et al.  Columbia University’s Baseline Detectors for 374 LSCOM Semantic Visual Concepts , 2007 .