TRECVID 2008 Participation by MCG-ICT-CAS

For TRECVID 2008 concept detection task, we principally focus on: (1) Early fusion of texture, edge and color features TECM, abbreviation of the combined TF*IDF weights based on SIFT features, Edge Histogram, and Color Moments. (2) To improve the training efficiency and explore the knowledge between concepts or hidden sub-domains more easily and efficiently, we propose a novel method based on Latent Dirichlet Allocation (LDA): LDA-based multiple-SVM (LDASVM). We first use LDA to cluster all the keyframes into topics according to the maximum element of the topic-simplex representation vector (TRV) of each keyframe. Then, we train the annotated data in each topic for each concept. During training, unlike multi-bag SVM, we only use positive samples in current topic for the sake of retaining sample’s separability, instead of all positive samples among the whole training set, and ignore the topics with too few positive samples. While testing a keyframe for a given concept, we adopt TRV as the weight vector, instead of equal weighting strategy, to combine the SVM outputs of topic-models. (3) Introduction of Pseudo Relevance Feedback (PRF) into our concept detection system for the purpose of making re-trained models more adaptive to the test data: unlike existing PRF techniques in text and video retrieval, we propose a preliminary strategy to explore the visual features of positive training samples to improve the quality of pseudo positive samples. Experimental results demonstrate that our proposed LDASVM approach is both effective and efficient.

[1]  Cordelia Schmid,et al.  Human Detection Based on a Probabilistic Assembly of Robust Part Detectors , 2004, ECCV.

[2]  Neil J. Gordon,et al.  A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking , 2002, IEEE Trans. Signal Process..

[3]  Lei Zhang,et al.  Canny edge detection enhancement by scale multiplication , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Stéphane Ayache,et al.  TRECVID 2007: Collaborative Annotation using Active Learning , 2007, TRECVID.

[5]  Gang Wang,et al.  Exploring knowledge of sub-domain in a multi-resolution bootstrapping framework for concept detection in news video , 2008, ACM Multimedia.

[6]  King-Ip Lin,et al.  The ANN-tree: an index for efficient approximate nearest neighbor search , 2001, Proceedings Seventh International Conference on Database Systems for Advanced Applications. DASFAA 2001.

[7]  Rong Yan,et al.  Can High-Level Concepts Fill the Semantic Gap in Video Retrieval? A Case Study With Broadcast News , 2007, IEEE Transactions on Multimedia.

[8]  Shih-Fu Chang,et al.  CU-VIREO 374 : Fusing Columbia 374 and VIREO 374 for Large Scale Semantic Concept Detection , 2008 .

[9]  Rong Yan,et al.  Semantic concept-based query expansion and re-ranking for multimedia retrieval , 2007, ACM Multimedia.

[10]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[11]  Meng Wang,et al.  MSRA-USTC-SJTU at TRECVID 2007: High-Level Feature Extraction and Search , 2007, TRECVID.

[12]  ZhangLei,et al.  Canny Edge Detection Enhancement by Scale Multiplication , 2005 .

[13]  John F. Canny,et al.  A Computational Approach to Edge Detection , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Ahmed K. Elmagarmid,et al.  InsightVideo: toward hierarchical video content organization for efficient browsing, summarization and retrieval , 2005, IEEE Transactions on Multimedia.

[15]  John R. Smith,et al.  Cluster-based data modeling for semantic video search , 2007, CIVR '07.

[16]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[17]  Sheng Tang,et al.  TRECVID 2007 Search Tasks by NUS-ICT , 2007, TRECVID.

[18]  Rong Yan,et al.  Filling the Semantic Gap in Video Retrieval: An Exploration , 2008 .

[19]  Richard Bowden,et al.  Detection and Tracking of Humans by Probabilistic Body Part Assembly , 2005, BMVC.

[20]  Christopher G. Harris,et al.  A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[21]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[22]  Jake K. Aggarwal,et al.  A hierarchical Bayesian network for event recognition of human actions and interactions , 2004, Multimedia Systems.

[23]  Nikos Paragios,et al.  Background modeling and subtraction of dynamic scenes , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[24]  Li Chen,et al.  Video copy detection: a comparative study , 2007, CIVR '07.