Detecting Group Activities With Multi-Camera Context

Human group activities detection in multi-camera CCTV surveillance videos is a pressing demand on smart surveillance. Previous works on this topic are mainly based on camera topology inference that is hard to apply to real-world unconstrained surveillance videos. In this paper, we propose a new approach for multi-camera group activities detection. Our approach simultaneously exploits intra-camera and inter-camera contexts without topology inference. Specifically, a discriminative graphical model with hidden variables is developed. The intra-camera and inter-camera contexts are characterized by the structure of hidden variables. By automatically optimizing the structure, the contexts are effectively explored. Furthermore, we propose a new spatiotemporal feature, named vigilant area (VA), to characterize the quantity and appearance of the motion in an area. This feature is effective for group activity representation and is easy to extract from a dynamic and crowded scene. We evaluate the proposed VA feature and discriminative graphical model extensively on two real-world multi-camera surveillance video data sets, including a public corpus consisting of 2.5 h of videos and a 468-h video collection, which, to the best of our knowledge, is the largest video collection ever used in human activity detection. The experimental results demonstrate the effectiveness of our approach.

[1]  J.K. Aggarwal,et al.  Human activity analysis , 2011, ACM Comput. Surv..

[2]  Thorsten Joachims,et al.  Learning structural SVMs with latent variables , 2009, ICML '09.

[3]  Thierry Artières,et al.  Large margin training for hidden Markov models with partially observed states , 2009, ICML '09.

[4]  Gang Hua,et al.  Semantic Model Vectors for Complex Video Event Recognition , 2012, IEEE Transactions on Multimedia.

[5]  Juan Carlos Niebles,et al.  Modeling Temporal Structure of Decomposable Motion Segments for Activity Classification , 2010, ECCV.

[6]  Junji Yamato,et al.  Recognizing human action in time-sequential images using hidden Markov model , 1992, Proceedings 1992 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[7]  Cordelia Schmid,et al.  Evaluation of Local Spatio-temporal Features for Action Recognition , 2009, BMVC.

[8]  Ferdinand van der Heijden,et al.  Efficient adaptive density estimation per image pixel for the task of background subtraction , 2006, Pattern Recognit. Lett..

[9]  Martial Hebert,et al.  Event Detection in Crowded Videos , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[10]  Lihi Zelnik-Manor,et al.  Event-based analysis of video , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[11]  Rama Chellappa,et al.  Machine Recognition of Human Activities: A Survey , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[12]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[13]  Cordelia Schmid,et al.  Actions in context , 2009, CVPR.

[14]  Silvio Savarese,et al.  What are they doing? : Collective activity classification using spatio-temporal relationship among people , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[15]  S. Gong,et al.  Global Abnormal Behaviour Detection Using a Network of CCTV Cameras , 2008 .

[16]  Bir Bhanu,et al.  VideoWeb Dataset for Multi-camera Activities and Non-verbal Communication , 2011 .

[17]  Trevor Darrell,et al.  Hidden Conditional Random Fields , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Alberto Del Bimbo,et al.  Recognizing human actions by fusing spatio-temporal appearance and motion descriptors , 2009, 2009 16th IEEE International Conference on Image Processing (ICIP).

[19]  Ting Yu,et al.  Group Level Activity Recognition in Crowded Environments across Multiple Cameras , 2010, 2010 7th IEEE International Conference on Advanced Video and Signal Based Surveillance.

[20]  Changsheng Xu,et al.  Generative Group Activity Analysis with Quaternion Descriptor , 2011, MMM.

[21]  Cordelia Schmid,et al.  Action recognition by dense trajectories , 2011, CVPR 2011.

[22]  Shaogang Gong,et al.  Modelling activity global temporal dependencies using Time Delayed Probabilistic Graphical Model , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[23]  Alexander G. Hauptmann,et al.  MoSIFT: Recognizing Human Actions in Surveillance Videos , 2009 .

[24]  Bingbing Ni,et al.  Recognizing human group activities with localized causalities , 2009, CVPR 2009.

[25]  Dimitrios Makris,et al.  Bridging the gaps between cameras , 2004, CVPR 2004.

[26]  Mubarak Shah,et al.  Human Action Recognition in Videos Using Kinematic Features and Multiple Instance Learning , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Ivan Laptev,et al.  On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[28]  W. Eric L. Grimson,et al.  Correspondence-Free Activity Analysis and Scene Modeling in Multiple Camera Views , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Mubarak Shah,et al.  Actions sketch: a novel action representation , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[30]  Jitendra Malik,et al.  Shape matching and object recognition using shape contexts , 2010, 2010 3rd International Conference on Computer Science and Information Technology.

[31]  W. Eric L. Grimson,et al.  Inference of non-overlapping camera network topology by measuring statistical dependence , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[32]  Ronald Poppe,et al.  A survey on vision-based human action recognition , 2010, Image Vis. Comput..

[33]  Ronen Basri,et al.  Actions as Space-Time Shapes , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Shaogang Gong,et al.  Activity based surveillance video content modelling , 2008, Pattern Recognit..

[35]  James L. Crowley,et al.  Probabilistic recognition of activity using local appearance , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[36]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[37]  Jean Ponce,et al.  Automatic annotation of human actions in video , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[38]  Ming Yang,et al.  Detecting video events based on action recognition in complex scenes using spatio-temporal descriptor , 2009, ACM Multimedia.

[39]  Ze-Nian Li BEYOND ACTIONS : DISCRIMINATIVE MODELS FOR CONTEXTUAL GROUP ACTIVITIES , 2010 .

[40]  Yihong Gong,et al.  Action detection in complex scenes with spatial and temporal ambiguities , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[41]  Shaogang Gong,et al.  Multi-camera activity correlation analysis , 2009, CVPR.

[42]  Shin'ichi Satoh,et al.  Robust Recognition of Specific Human Behaviors in Crowded Surveillance Video Sequences , 2010, EURASIP J. Adv. Signal Process..

[43]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[44]  Mubarak Shah,et al.  A 3-dimensional sift descriptor and its application to action recognition , 2007, ACM Multimedia.

[45]  Jake K. Aggarwal,et al.  Stochastic Representation and Recognition of High-Level Group Activities , 2011, International Journal of Computer Vision.

[46]  Jake K. Aggarwal,et al.  Spatio-temporal relationship match: Video structure comparison for recognition of complex human activities , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[47]  Rémi Ronfard,et al.  Free viewpoint action recognition using motion history volumes , 2006, Comput. Vis. Image Underst..