Detecting Human Actions in Surveillance Videos

This notebook paper summarizes Team NEC-UIUC’s approaches for TRECVid 2009 Evaluation of Surveillance Event Detection. Our submissions include two types of systems. One system employs the brute force search method to test each space-time location in the video by a binary classifier on whether a specific event occurs. The other system takes advantage of human detection and tracking to avoid the costly brute force search and evaluates the candidate space-time cubes by combining 3D convolutional neural networks (CNN) and SVM classifiers based on bag-ofwords local features to detect the presence of events of inte rests. Via thorough cross-validation on the development set , we select proper combining weights and thresholds to minimize the detection cost rates (DCR). Our systems achieve good performance on event categories which involve actions of a single person, e.g. CellToEar , ObjectPut , and Pointing.

[1]  Béla Ágai,et al.  CONDENSED 1,3,5-TRIAZEPINES - V THE SYNTHESIS OF PYRAZOLO [1,5-a] [1,3,5]-BENZOTRIAZEPINES , 1983 .

[2]  Yihong Gong,et al.  Deep Learning with Kernel Regularization for Visual Recognition , 2008, NIPS.

[3]  Mei Han,et al.  An algorithm for multiple object trajectory tracking , 2004, CVPR 2004.

[4]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[5]  Ivan Laptev,et al.  On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[6]  Boris Polyak,et al.  Acceleration of stochastic approximation by averaging , 1992 .

[7]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[8]  Yihong Gong,et al.  Human action detection by boosting efficient motion features , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[9]  Patrick Pérez,et al.  Retrieving actions in movies , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[10]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[11]  Stanley T. Birchfield,et al.  Elliptical head tracking using intensity gradients and color histograms , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[12]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[13]  Thomas Serre,et al.  A Biologically Inspired System for Action Recognition , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[14]  Yihong Gong,et al.  Training Hierarchical Feed-Forward Visual Recognition Models Using Transfer Learning from Pseudo-Tasks , 2008, ECCV.

[15]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[16]  Ming Yang,et al.  Detection driven adaptive multi-cue integration for multiple human tracking , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[17]  Mei Han,et al.  An algorithm for multiple object trajectory tracking , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[18]  Ivan Laptev,et al.  Local Descriptors for Spatio-temporal Recognition , 2004, SCVMA.

[19]  James W. Davis,et al.  The Recognition of Human Movement Using Temporal Templates , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[20]  Simon Haykin,et al.  GradientBased Learning Applied to Document Recognition , 2001 .

[21]  Eli Shechtman,et al.  Space-time behavior based correlation , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[22]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[23]  Marc'Aurelio Ranzato,et al.  Unsupervised Learning of Invariant Feature Hierarchies with Applications to Object Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Jitendra Malik,et al.  Recognizing action at a distance , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.