Oxford TRECVID 2008 - Notebook paper

The Oxford/IIIT team participated in the high-level feature extraction and interactive search tasks. A vision only approach was used for both tasks, with no use of the text or audio information. For the high-level feature extraction task, we used two different approaches, both based on a combination of visual features. One used a SVM classifier using a linear combination of kernels, the other used a random forest classifier. For both methods, we trained all high-level features using publicly available annotations [24]. The advantage of the random forest classifier is the speed of training and testing. In addition, for thepeople feature, we took a more targeted approach. We used a real-time face detector and an upper body detector, in both cases running on every frame. One run C OXVGG 4 4 was submitted using only Random forest approach for all concepts except two people. It performed best on Dog and hand, and came over the median for all the classes except kitchen and airplane. In the interactive search task, our team came third overall with an mAP of 0.158. The system used was identical to last year with the only change being a source of accurate upper body detections.

[1]  Michael Isard,et al.  General Theory , 1969 .

[2]  Antonio Criminisi,et al.  Object Class Recognition at a Glance , 2006 .

[3]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[4]  Yali Amit,et al.  Shape Quantization and Recognition with Randomized Trees , 1997, Neural Computation.

[5]  Andrew Blake,et al.  A sparse probabilistic learning algorithm for real-time tracking , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[6]  Andrew Zisserman,et al.  Oxford TRECVID 2006 - Notebook paper , 2006, TRECVID.

[7]  Dennis Koelma,et al.  The MediaMill TRECVID 2008 Semantic Video Search Engine , 2008, TRECVID.

[8]  Daniel P. Huttenlocher,et al.  Pictorial Structures for Object Recognition , 2004, International Journal of Computer Vision.

[9]  Raphaël Marée,et al.  Random subwindows for robust image classification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[10]  Manik Varma,et al.  Learning The Discriminative Power-Invariance Trade-Off , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[11]  Stéphane Ayache,et al.  Video Corpus Annotation Using Active Learning , 2008, ECIR.

[12]  Andrew Zisserman,et al.  Regression and classification approaches to eye localization in face images , 2006, 7th International Conference on Automatic Face and Gesture Recognition (FGR06).

[13]  Luc Van Gool,et al.  Real-time affine region tracking and coplanar grouping , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[14]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[15]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[16]  Vincent Lepetit,et al.  Keypoint recognition using randomized trees , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[18]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[19]  Vincent Lepetit,et al.  Fast Keypoint Recognition in Ten Lines of Code , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.