Shanghai Jiao Tong University participation in high-level feature extraction and surveillance event detection at TRECVID 2009

In this paper, we describe our participation for high-level feature extraction, automatic search and surveillance event detection at TRECVID 2009 evaluation. In high-level feature extraction, we establish a common feature set for all the predefined concepts, including global features and local features extracted from the keyframes. For the concepts related to person activity, space--time interest points are also used. Detection of ROI and Faces is needed for some special concepts, such as playing instrument, female face close-up. Classifiers are trained using these features and linear weighted fusion of the classification results are utilized as the baseline. Specifically, simple average fusion can work pretty well. Further, ASR and IB re-ranking are used to improve the overall performance. We submitted the following six runs: z A_SJTU_ICIP_Lab317_1: Average fusion of classification results with global features and local features used, SVM classifiers are trained on TRECVID2009 development data z A_SJTU_ICIP_Lab317_2: Linear weighted fusion of classification results with global and local features used, SVM classifiers are trained on TRECVID2009 development data z A_SJTU_ICIP_Lab317_3: Max of RUN1 and RUN2, and re-rank on ASR z A_SJTU_ICIP_Lab317_4: Max of RUN1 and RUN2, and re-rank on IB re-ranking z A_SJTU_ICIP_Lab317_5: Based on the result of RUN3, combine ASR and IB re-ranking z A_SJTU_ICIP_Lab317_6: Max of all runs In Event detection, trajectory features obtained from human tracking and optical flow computation, local appearance and shape features are employed in event model training. With regard to particular event detection tasks, several detection rules are tested using HMM models, boosted classifiers, matching and heuristic settings. We provide the detection results of eight event tasks out of 10 required events for performance evaluation. z SJTU_2009_retroED_EVAL09_ENG_s-camera_p-baseline_1: Event detection based on human tracking, motion detection and gesture recognition

[1]  P. KaewTrakulPong,et al.  An Improved Adaptive Background Mixture Model for Real-time Tracking with Shadow Detection , 2002 .

[2]  Fatih Murat Porikli,et al.  Region Covariance: A Fast Descriptor for Detection and Classification , 2006, ECCV.

[3]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[4]  Shih-Fu Chang,et al.  Video search reranking via information bottleneck principle , 2006, MM '06.

[5]  J.-Y. Bouguet,et al.  Pyramidal implementation of the lucas kanade feature tracker , 1999 .

[6]  Sadiye Guler,et al.  Intuvision Event Detection System FORTRECVID 2008 , 2008, TRECVID.

[7]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[8]  Deva Ramanan,et al.  Learning to parse images of articulated bodies , 2006, NIPS.

[9]  Mei-Chen Yeh,et al.  Fast Human Detection Using a Cascade of Histograms of Oriented Gradients , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[10]  Gary R. Bradski,et al.  Motion segmentation and pose recognition with motion history gradients , 2002, Machine Vision and Applications.

[11]  Xiaokang Yang,et al.  Camshift Guided Particle Filter for Visual Tracking , 2007, 2007 IEEE Workshop on Signal Processing Systems.

[12]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.