Retrieving Actions in Group Contexts

We develop methods for action retrieval from surveillance video using contextual feature representations. The novelty of our proposed approach is two-fold. First, we introduce a new feature representation called the action context (AC) descriptor. The AC descriptor encodes information about not only the action of an individual person in the video, but also the behaviour of other people nearby. This feature representation is inspired by the fact that the context of what other people are doing provides very useful cues for recognizing the actions of each individual. Second, we formulate our problem as a retrieval/ranking task, which is different from previous work on action classification. We develop an action retrieval technique based on rank-SVM, a state-of-the-art approach for solving ranking problems. We apply our proposed approach on two real-world datasets. The first dataset consists of videos of multiple people performing several group activities. The second dataset consists of surveillance videos from a nursing home environment. Our experimental results show the advantage of using contextual information for disambiguating different actions and the benefit of using rank-SVMs instead of regular SVMs for video retrieval problems.

[1]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  W. Eric L. Grimson,et al.  Unsupervised Activity Perception in Crowded and Complicated Scenes Using Hierarchical Bayesian Models , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, ICPR 2004.

[4]  Silvio Savarese,et al.  What are they doing? : Collective activity classification using spatio-temporal relationship among people , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[5]  Jianbo Shi,et al.  Detecting unusual activity in video , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[6]  Yang Wang,et al.  Human Action Recognition by Semilatent Topic Models , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Larry S. Davis,et al.  Understanding videos, constructing plots learning a visually grounded storyline model from annotated videos , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[9]  Cordelia Schmid,et al.  Actions in context , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Dong Han,et al.  Selection and context for action recognition , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[11]  Mubarak Shah,et al.  Abnormal crowd behavior detection using social force model , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Dana H. Ballard,et al.  Computer Vision , 1982 .

[13]  Juan Carlos Niebles,et al.  Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words , 2006, BMVC.

[14]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[15]  Ronen Basri,et al.  Actions as space-time shapes , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[16]  O. Chapelle Large margin optimization of ranking measures , 2007 .

[17]  Greg Mori,et al.  IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL., NO. 1 Human Action Recognition by Semi-Latent Topic Models , 2022 .

[18]  Shaogang Gong,et al.  Beyond Tracking: Modelling Activity and Understanding Behaviour , 2006, International Journal of Computer Vision.

[19]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[20]  David A. McAllester,et al.  A discriminatively trained, multiscale, deformable part model , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Thorsten Joachims,et al.  A support vector method for multivariate performance measures , 2005, ICML.

[22]  Shaogang Gong,et al.  Modelling activity global temporal dependencies using Time Delayed Probabilistic Graphical Model , 2009, 2009 IEEE 12th International Conference on Computer Vision.