Action Detection in Crowd

For automated surveillance, it is useful to detect specific actions performed by people in busy natural environments. This differs from and thus more challenging than the intensively studied action recognition problem in that for action detection in crowd an action of interest is often overwhelmed by large number of background activities of other objects in the scene. Motivated by the success of sliding-window based 2D object detection approaches, in this paper, we propose to tackle the problem by learning a discriminative classifier from annotated 3D action cuboids to capture the intra-class variation, and sliding 3D search windows for detection. The key innovation of our method is a novel greedy k nearest neighbour algorithm for automated annotation of positive training data, by which an action detector can be learned with only a single training sequence being annotated thus greatly alleviating the tedious and unreliable 3D manual annotation. Extensive experiments on real-world action detection datasets demonstrate that our detector trained with minimal annotation can achieve comparable results to that learned with full annotation, and outperforms existing methods.

[1]  Thomas Hofmann,et al.  Support Vector Machines for Multiple-Instance Learning , 2002, NIPS.

[2]  Jitendra Malik,et al.  Recognizing action at a distance , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[3]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[4]  i-LIDS Team,et al.  Imagery Library for Intelligent Detection Systems (i-LIDS); A Standard for Testing Video Based Detection Systems , 2006, Proceedings 40th Annual 2006 International Carnahan Conference on Security Technology.

[5]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[6]  Juan Carlos Niebles,et al.  Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words , 2006, BMVC.

[7]  Martial Hebert,et al.  Event Detection in Crowded Videos , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[8]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[9]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Luc Van Gool,et al.  An Efficient Dense and Scale-Invariant Spatio-Temporal Interest Point Detector , 2008, ECCV.

[11]  Yang Wang,et al.  Efficient Human Action Detection Using a Transferable Distance Function , 2009, ACCV.

[12]  Jintao Li,et al.  Hierarchical spatio-temporal context modeling for action recognition , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Ying Wu,et al.  Discriminative subvolume search for efficient action detection , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Jiebo Luo,et al.  Recognizing realistic actions from videos “in the wild” , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Martial Hebert,et al.  Trajectons: Action recognition through the motion analysis of tracked features , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[16]  Yongdong Zhang,et al.  Localizing volumetric motion for action recognition in realistic videos , 2009, MM '09.

[17]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Yihong Gong,et al.  Action detection in complex scenes with spatial and temporal ambiguities , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[19]  Mubarak Shah,et al.  Human Action Recognition in Videos Using Kinematic Features and Multiple Instance Learning , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Ronald Poppe,et al.  A survey on vision-based human action recognition , 2010, Image Vis. Comput..