UCF @ TRECVID 2009 : High-Level Feature Extraction

This year, the University of Central Florida participated in the high level feature extraction task (HLF). The goal of high level feature extraction is to identify in videos specific shots that contain concepts such as “bus,” “person playing soccer,” and “boat/ship.” In our submissions, we focused on addressing the large imbalance between the positive and negative training examples. Specifically, we implemented a method called bootstrapping that identifies the best subset of negative examples to train on. In our experiments, we found bootstrapping significantly lowered the probability of false alarm while also improving the probability of detection. Additionally, we also explored different word weighting techniques. In the bag of words approach, certain words may be more discriminative than others; these words should be weighted more. This task served as a project for several students participating in the Research Experience for Undergraduates program (REU) at UCF.

[1]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[2]  Chong-Wah Ngo,et al.  Towards optimal bag-of-features for object categorization and semantic video retrieval , 2007, CIVR '07.

[3]  Koen E. A. van de Sande,et al.  Evaluation of color descriptors for object and scene recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Jorma Laaksonen,et al.  PicSOM Experiments in TRECVID 2018 , 2015, TRECVID.

[5]  Yueting Zhuang,et al.  Adaptive key frame extraction using unsupervised clustering , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[6]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[7]  Pierre Tirilly,et al.  A review of weighting schemes for bag of visual words image retrieval , 2009 .

[8]  Francesca Odone,et al.  Histogram intersection kernel for image classification , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[9]  Matsumoto Kazunori,et al.  Content-based Retrieval of User Generated Video Using Frame Clustering , 2007 .

[10]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[11]  Jiebo Luo,et al.  Recognizing realistic actions from videos “in the wild” , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Nobuyuki Yagi,et al.  NHK STRL at TRECVID 2008: High-Level Feature Extraction and Surveillance Event Detection , 2008, TREC Video Retrieval Evaluation.

[13]  Bernd Freisleben,et al.  University of Marburg at TRECVID 2008: High-Level Feature Extraction , 2008, TRECVID.

[14]  Christian Petersohn Fraunhofer HHI at TRECVID 2004: Shot Boundary Detection System , 2004, TRECVID.

[15]  Keiichiro Hoashi,et al.  High-Level Feature Extraction Experiments for TRECVID 2007 , 2007, TRECVID.