PKU-ICST at TRECVID 2009: High Level Feature Extraction and Search

We participate in two tasks of TRECVID 2009: high-level feature extraction (HLFE) and search. This paper presents our approaches and results in the two tasks. In HLFE task, we mainly focus on exploring the effective feature representation, data imbalance learning and fusion between different data sets. In feature representation, we adopt five basic visual features and six keypoint-based BoW features, and combine them to represent each keyframe image. In imbalance learning, we propose two methods for this problem: OnUm and concept category. In the fusion between different data sets, we use three different training sets: (1) TRECVID 2009 training data set (Tv09), (2) TRECVID 2005 training data set (Tv05), and (3) Flickr images. In search task, we participate in two types of search tasks: automatic search and manual search. We explore multimodal feature representation, which includes visual-based features, concept-based feature, audio features and face features. Based on these features, two retrieval methods are jointly adopted for search task: pair-wise similarity measure and learning-based ranking. We achieve the good results in both tasks. In HLFE task, official evaluation shows that our team ranks 2 nd in type A and 1 st in types C, a and c. In Search task, official evaluations show that our team rank 2 nd in automatic search and 1 st in manual search.

[1]  James M. Rehg,et al.  Fast Asymmetric Learning for Cascade Face Detection , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[3]  Yannis Stylianou,et al.  Musical Genre Classification Using Nonnegative Matrix Factorization-Based Features , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  Alexander G. Hauptmann,et al.  LSCOM Lexicon Definitions and Annotations (Version 1.0) , 2006 .

[5]  Cordelia Schmid,et al.  Scale & Affine Invariant Interest Point Detectors , 2004, International Journal of Computer Vision.

[6]  Jiri Matas,et al.  Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[7]  Chong-Wah Ngo,et al.  Towards optimal bag-of-features for object categorization and semantic video retrieval , 2007, CIVR '07.

[8]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).