TRECVID 2006 by NUS-I2R

NUS and IR joint participated in the high-level feature extraction and automated search task for TRECVID 2006. In both task, we only make use of the standard TRECVID available annotation results. For HLF task, we develop 2 methods to perform automated concept annotation: (a) fully machine learning approach using SVM, LDF and GMM; and (b) Bi-gram model for Pattern Discovery and Matching. As for the automated search task, our emphases this year are: 1) integration of HLF: query-analysis to match query to possible related HLF and fuse results from various participating groups in the HLF task; and 2) integration of event structures present implicitly in news video in a time-dependent event-based retrieval. The proposed generic framework involves various multimodal (including HLFs) features as well as the implicit temporal and event structures to support precise news video retrieval. 1. HIGH LEVEL FEATURE EXTRACTION TASK 1.1 Visual Feature Extraction The visual features used in our systems are listed in the below:  Global color correlogram (GCC) in HSV space: 324-dimension.  Co-occurrence texture extracted from global gray-level co-occurrence matrix (GLCM): 64-dimension.  3-D global color histogram in HSV (HSV): 162-dimension.  3-D global color histogram in RGB (RGB): 125-dimension.  3-D global color histogram in LAB (LAB): 125-dimension.  Gabor filter (2-scale, 12 orientations) based texture feature (Gabor): 48-dimension extracted from one of 5*5 patches uniformly segmented of the image. The textual features used are described in Section 2. 1.2 Machine Learning approach by SVM, LDF and GMM For each type of extracted feature, we make use of SVM classifier (SVM), linear discriminative function (LDF) classifier or Gaussian mixture model (GMM) for training. The details are summarized in Table 1. Thus, there are 9 visual classifiers trained for each concept. The SVM is trained using the SVM tool (Joachims, 2002) and the LDF and GMM classifiers are trained using the AUC maximized learning algorithm (Gao & Sun, 2006). Based on our experiments on the development set of TRECVID 2005, we find AUC maximized learning algorithm works better than the best-tuned SVM system. Table 1: Description of classifiers (+: the classifier is available for the feature) SVM LDF GMM GCC + + GLCM + + HSV + + RGB + LAB + Gabor + For each image, we use the 451 (9*39) features from the feature-space and then train a SVM classifier for each concept. PCA or LSA is also applied to reduce the selected feature space dimensions. We follow the approach in (Gao et al, 2007) which uses a knowledge-based method to retain only informative components. This is done by extracting the pair-wise concepts association among the 39 concepts based on the development set. The strength of the association, Str, for the target concept A and the concept B is defined as, * Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China

[1]  Sheng Gao,et al.  Exploiting Concept Association to Boost Multimedia Semantic Concept Detection , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[2]  Thorsten Joachims,et al.  Learning to classify text using support vector machines - methods, theory and algorithms , 2002, The Kluwer international series in engineering and computer science.

[3]  Gang Wang,et al.  TRECVID 2004 Search and Feature Extraction Task by NUS PRIS , 2004, TRECVID.

[4]  Jun Yang,et al.  CMU Informedia's TRECVID 2005 Skirmishes , 2005, TRECVID.

[5]  Alan F. Smeaton,et al.  TRECVID 2004 Experiments in Dublin City University , 2004, TRECVID.

[6]  Tat-Seng Chua,et al.  Multi-faceted contextual model for person identification in news video , 2006, 2006 12th International Multi-Media Modelling Conference.

[7]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[8]  Chin-Hui Lee,et al.  The segmentation of news video into story units , 2002, Proceedings. IEEE International Conference on Multimedia and Expo.

[9]  Christian Petersohn Fraunhofer HHI at TRECVID 2004: Shot Boundary Detection System , 2004, TRECVID.

[10]  George A. Miller,et al.  Introduction to WordNet: An On-line Lexical Database , 1990 .

[11]  Grace Hui Yang,et al.  Structured use of external knowledge for event-based open domain question answering , 2003, SIGIR.

[12]  Dennis Koelma,et al.  The MediaMill TRECVID 2008 Semantic Video Search Engine , 2008, TRECVID.

[13]  Jin Zhao,et al.  Video Retrieval Using High Level Features: Exploiting Query Matching and Confidence-Based Weighting , 2006, CIVR.

[14]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[15]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[16]  Jean-Luc Gauvain,et al.  The LIMSI Broadcast News transcription system , 2002, Speech Commun..

[17]  Lie Lu,et al.  Content-based audio segmentation using support vector machines , 2001, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001..

[18]  Tat-Seng Chua,et al.  TRECVID 2005 by NUS PRIS , 2005, TRECVID.

[19]  Hui Zhang,et al.  The Segmentation of News Video into Story Units , 2005, WAIM.

[20]  Shih-Fu Chang,et al.  Visual Cue Cluster Construction via Information Bottleneck Principle and Kernel Density Estimation , 2005, CIVR.

[21]  John R. Smith,et al.  IBM Research TRECVID-2009 Video Retrieval System , 2009, TRECVID.

[22]  Sheng Gao,et al.  Classifier Optimization for Multimedia Semantic Concept Detection , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[23]  Qi Tian,et al.  News video search with fuzzy event clustering using high-level features , 2006, MM '06.

[24]  Chin-Hui Lee,et al.  Fusion of Region and Image-Based Techniques for Automatic Image Annotation , 2007, MMM.