Deep neural network model for group activity recognition using contextual relationship

Abstract In this paper, we present contextual relationship-based learning model using deep neural network for recognizing the activities performed by a group of people in a video sequence. The proposed model comprises of the context learning using a bottom-up approach, learning from individual human actions to group level activity as well as learning from the scene information. We build deep convolutional neural network model to capture human action-pose feature for a given input video sequence. To capture group level temporal flow changes, aggregated action-pose feature of persons within the context area are fed to deep recurrent neural network, which provides spatio-temporal group descriptor. Together with this, we build a scene level convolutional neural network, to extract scene level feature which improves the performance of group activity recognition. The probabilistic inference model, as an additional layer in deep neural network, added to ensemble the models and provide a unified deep learning framework. Experimental results show the efficiency of the proposed model on standard benchmark collective activity dataset in group activity recognition. We also present the evaluated results by varying different learning parameters, optimizers, especially recurrent neural network models long short-term memory and gated recurrent unit on the benchmark collective activity dataset.

[1]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[2]  N. C. Chauhan,et al.  A Comprehensive Study of Group Activity Recognition Methods in Video , 2017 .

[3]  Shigeyuki Odashima,et al.  A fully connected model for consistent collective activity recognition in videos , 2014, Pattern Recognit. Lett..

[4]  Silvio Savarese,et al.  A Unified Framework for Multi-target Tracking and Collective Activity Recognition , 2012, ECCV.

[5]  Mohamed R. Amer,et al.  Monte Carlo Tree Search for Scheduling Activity Recognition , 2013, 2013 IEEE International Conference on Computer Vision.

[6]  Alex Graves,et al.  Supervised Sequence Labelling , 2012 .

[7]  Ioannis A. Kakadiaris,et al.  Activity analysis in crowded environments using social cues for group discovery and human interaction modeling , 2014, Pattern Recognit. Lett..

[8]  Christopher Joseph Pal,et al.  Activity recognition using the velocity histories of tracked keypoints , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[9]  Yang Wang,et al.  Retrieving Actions in Group Contexts , 2010, ECCV Workshops.

[10]  Greg Mori,et al.  Learning Ensembles of Potential Functions for Structured Prediction with Latent Variables , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[11]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[12]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[13]  J.K. Aggarwal,et al.  Human activity analysis , 2011, ACM Comput. Surv..

[14]  Lei Chen,et al.  Deep Structured Models For Group Activity Recognition , 2015, BMVC.

[15]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[16]  Yudong Zhang,et al.  Alcoholism Detection by Data Augmentation and Convolutional Neural Network with Stochastic Pooling , 2017, Journal of Medical Systems.

[17]  Steven C. H. Hoi,et al.  Face Detection using Deep Learning: An Improved Faster RCNN Approach , 2017, Neurocomputing.

[18]  Yudong Zhang,et al.  Abnormal breast identification by nine-layer convolutional neural network with parametric rectified linear unit and rank-based stochastic pooling , 2018, J. Comput. Sci..

[19]  Wang Yan,et al.  Visual recognition by counting instances: A multi-instance cardinality potential kernel , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[21]  Nicoletta Noceti,et al.  Humans in groups: The importance of contextual information for understanding collective activities , 2014, Pattern Recognit..

[22]  Rajiv Kapoor,et al.  Human Activity Recognition Using Gabor Wavelet Transform and Ridgelet Transform , 2015 .

[23]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Greg Mori,et al.  A Hierarchical Deep Temporal Model for Group Activity Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Mohamed R. Amer,et al.  Cost-Sensitive Top-Down/Bottom-Up Inference for Multiscale Activity Recognition , 2012, ECCV.

[26]  Shigeyuki Odashima,et al.  Viewpoint Invariant Collective Activity Recognition with Relative Action Context , 2012, ECCV Workshops.

[27]  Mohamed R. Amer,et al.  A chains model for localizing participants of group activities in videos , 2011, 2011 International Conference on Computer Vision.

[28]  Ronen Basri,et al.  Actions as space-time shapes , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[29]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, ICPR 2004.

[30]  Silvio Savarese,et al.  What are they doing? : Collective activity classification using spatio-temporal relationship among people , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[31]  Shishir K. Shah,et al.  Human Activity Recognition using Deep Neural Network with Contextual Information , 2017, VISIGRAPP.

[32]  Yang Wang,et al.  Discriminative Latent Models for Recognizing Contextual Group Activities , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Cassandra Mariette Carley Human Activity Analysis , 2018 .

[34]  Chee Seng Chan,et al.  Crowd Saliency Detection via Global Similarity Structure , 2014, 2014 22nd International Conference on Pattern Recognition.

[35]  Seong-Whan Lee,et al.  Group Activity Recognition with Group Interaction Zone , 2014, 2014 22nd International Conference on Pattern Recognition.

[36]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  Yimin Yang,et al.  Effect of wavelet and hybrid classification on action recognition , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[38]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.