Multiview human activity recognition system based on spatiotemporal template for video surveillance system

Abstract. An efficient view invariant framework for the recognition of human activities from an input video sequence is presented. The proposed framework is composed of three consecutive modules: (i) detect and locate people by background subtraction, (ii) view invariant spatiotemporal template creation for different activities, (iii) and finally, template matching is performed for view invariant activity recognition. The foreground objects present in a scene are extracted using change detection and background modeling. The view invariant templates are constructed using the motion history images and object shape information for different human activities in a video sequence. For matching the spatiotemporal templates for various activities, the moment invariants and Mahalanobis distance are used. The proposed approach is tested successfully on our own viewpoint dataset, KTH action recognition dataset, i3DPost multiview dataset, MSR viewpoint action dataset, VideoWeb multiview dataset, and WVU multiview human action recognition dataset. From the experimental results and analysis over the chosen datasets, it is observed that the proposed framework is robust, flexible, and efficient with respect to multiple views activity recognition, scale, and phase variations.

[1]  Sergio A. Velastin,et al.  Intelligent distributed surveillance systems: a review , 2005 .

[2]  Michael Hofmann,et al.  Multi-view 3D human pose estimation combining single-frame recovery, temporal integration and model adaptation , 2009, CVPR.

[3]  Yu-Chee Tseng,et al.  A multi-view visual surveillance system based on angle coverage , 2010, SenSys '10.

[4]  Tieniu Tan,et al.  A survey on visual surveillance of object motion and behaviors , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[5]  Ioannis Pitas,et al.  3D Human Action Recognition for Multi-view Camera Systems , 2011, 2011 International Conference on 3D Imaging, Modeling, Processing, Visualization and Transmission.

[6]  Martial Hebert,et al.  Volumetric Features for Video Event Detection , 2010, International Journal of Computer Vision.

[7]  Richard Souvenir,et al.  Learning the viewpoint manifold for action recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  James W. Davis,et al.  The Recognition of Human Movement Using Temporal Templates , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Larry S. Davis,et al.  Learned Models for Estimation of Rigid and Articulated Human Motion from Stationary or Moving Camera , 2004, International Journal of Computer Vision.

[10]  Viet Khoi. Nguyen Recognition of human actions. , 2012 .

[11]  Alex Pentland,et al.  Pfinder: real-time tracking of the human body , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[12]  Mohiuddin Ahmad,et al.  Human action recognition using shape and CLG-motion flow from multi-view image sequences , 2008, Pattern Recognit..

[13]  Patrick Pérez,et al.  View-Independent Action Recognition from Temporal Self-Similarities , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Stephen B. Gray,et al.  Local Properties of Binary Images in Two Dimensions , 1971, IEEE Transactions on Computers.

[15]  Gian Luca Foresti,et al.  Real-time thresholding with Euler numbers , 2003, Pattern Recognit. Lett..

[16]  Ying Wu,et al.  Discriminative subvolume search for efficient action detection , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Hironobu Fujiyoshi,et al.  Moving target classification and tracking from real-time video , 1998, Proceedings Fourth IEEE Workshop on Applications of Computer Vision. WACV'98 (Cat. No.98EX201).

[18]  Ali Farhadi,et al.  Learning to Recognize Activities from the Wrong View Point , 2008, ECCV.

[19]  Ming-Kuei Hu,et al.  Visual pattern recognition by moment invariants , 1962, IRE Trans. Inf. Theory.

[20]  Luís Corte-Real,et al.  A 3D model based visual surveillance system , 2000, 2000 10th European Signal Processing Conference.

[21]  Bir Bhanu,et al.  VideoWeb Dataset for Multi-camera Activities and Non-verbal Communication , 2011 .

[22]  Ankush Mittal,et al.  Study of Robust and Intelligent Surveillance in Visible and Multi-modal Framework , 2007, Informatica.

[23]  Rémi Ronfard,et al.  Free viewpoint action recognition using motion history volumes , 2006, Comput. Vis. Image Underst..

[24]  Zhiquan Wang,et al.  Recognition of human activities using SVM multi-class classifier , 2010, Pattern Recognit. Lett..

[25]  Nazli Ikizler-Cinbis,et al.  Object, Scene and Actions: Combining Multiple Features for Human Action Recognition , 2010, ECCV.

[26]  Greg Mori,et al.  Action recognition by learning mid-level motion features , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  S. BEUCHER,et al.  CLOVIS-A generic framework for general purpose visual surveillance applications , 2006 .

[28]  Rémi Ronfard,et al.  A survey of vision-based methods for action representation, segmentation and recognition , 2011, Comput. Vis. Image Underst..

[29]  Tieniu Tan,et al.  Model-Based Localisation and Recognition of Road Vehicles , 1998, International Journal of Computer Vision.

[30]  J. Sullivan,et al.  Action Recognition by Shape Matching to Key Frames , 2002 .

[31]  Barbara Caputo,et al.  Local velocity-adapted motion events for spatio-temporal recognition , 2007, Comput. Vis. Image Underst..

[32]  Larry S. Davis,et al.  W4: Real-Time Surveillance of People and Their Activities , 2000, IEEE Trans. Pattern Anal. Mach. Intell..