TRECVID 2007 High-Level Feature Extraction By MCG-ICT-CAS

We participated in the high-level feature extraction task in TRECVID 2007. This paper describes the details of our system for the task. For feature extraction, we propose an EMD-based bag-of-feature method to exploit visual/spatial information, and utilize WordNet to expand semantic meanings of text to boost up the generalization of detectors. We also explore audio features and extract the motion cues in compressed domain for detecting concepts highly associated with audio/motion. We use Ordered Weighted Average (OWA) fusion method to combine the SVM-based multi-modal concept detection results. Experiment results show that our methods are effective.

[1]  John F. Canny,et al.  A Computational Approach to Edge Detection , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Jin Zhao,et al.  Video Retrieval Using High Level Features: Exploiting Query Matching and Confidence-Based Weighting , 2006, CIVR.

[3]  Ahmed K. Elmagarmid,et al.  InsightVideo: toward hierarchical video content organization for efficient browsing, summarization and retrieval , 2005, IEEE Transactions on Multimedia.

[4]  John R. Smith,et al.  Cluster-based data modeling for semantic video search , 2007, CIVR '07.

[5]  Sheng Tang,et al.  A density-based method for adaptive LDA model selection , 2009, Neurocomputing.

[6]  J. Kacprzyk,et al.  The Ordered Weighted Averaging Operators: Theory and Applications , 1997 .

[7]  John R. Smith,et al.  IBM Research TRECVID-2009 Video Retrieval System , 2009, TRECVID.

[8]  Cordelia Schmid,et al.  Human Detection Based on a Probabilistic Assembly of Robust Part Detectors , 2004, ECCV.

[9]  Neil J. Gordon,et al.  A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking , 2002, IEEE Trans. Signal Process..

[10]  Lei Zhang,et al.  Canny edge detection enhancement by scale multiplication , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  G. Clark,et al.  Reference , 2008 .

[12]  King-Ip Lin,et al.  The ANN-tree: an index for efficient approximate nearest neighbor search , 2001, Proceedings Seventh International Conference on Database Systems for Advanced Applications. DASFAA 2001.

[13]  Rong Yan,et al.  Filling the Semantic Gap in Video Retrieval: An Exploration , 2008 .

[14]  Rong Yan,et al.  Can High-Level Concepts Fill the Semantic Gap in Video Retrieval? A Case Study With Broadcast News , 2007, IEEE Transactions on Multimedia.

[15]  Gang Wang,et al.  Exploring knowledge of sub-domain in a multi-resolution bootstrapping framework for concept detection in news video , 2008, ACM Multimedia.

[16]  Trevor Darrell,et al.  Efficient image matching with distributions of local invariant features , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[17]  Jake K. Aggarwal,et al.  A hierarchical Bayesian network for event recognition of human actions and interactions , 2004, Multimedia Systems.

[18]  Rong Yan,et al.  Semantic concept-based query expansion and re-ranking for multimedia retrieval , 2007, ACM Multimedia.

[19]  Meng Wang,et al.  MSRA-USTC-SJTU at TRECVID 2007: High-Level Feature Extraction and Search , 2007, TRECVID.

[20]  Christiane Fellbaum,et al.  Combining Local Context and Wordnet Similarity for Word Sense Identification , 1998 .

[21]  Milan Sonka,et al.  Image Processing, Analysis and Machine Vision , 1993, Springer US.

[22]  Boon-Lock Yeo,et al.  On the extraction of DC sequence from MPEG compressed video , 1995, Proceedings., International Conference on Image Processing.

[23]  Shih-Fu Chang,et al.  CU-VIREO 374 : Fusing Columbia 374 and VIREO 374 for Large Scale Semantic Concept Detection , 2008 .

[24]  Nikos Paragios,et al.  Background modeling and subtraction of dynamic scenes , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[25]  Li Chen,et al.  Video copy detection: a comparative study , 2007, CIVR '07.

[26]  Christopher G. Harris,et al.  A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[27]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[28]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[29]  Larry S. Davis,et al.  W4: Real-Time Surveillance of People and Their Activities , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[30]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[31]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[32]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[33]  Yongdong Zhang,et al.  Segregated feedback with performance-based adaptive sampling for interactive news video retrieval , 2007, ACM Multimedia.

[34]  Richard Bowden,et al.  Detection and Tracking of Humans by Probabilistic Body Part Assembly , 2005, BMVC.

[35]  ZhangLei,et al.  Canny Edge Detection Enhancement by Scale Multiplication , 2005 .