Crowd Video Classification Using Convolutional Neural Networks

Deep learning tools such as the convolutional neural network (CNN) are extensively used for image analysis and interpretation tasks but they become relatively expensive to use for a corresponding analysis in videos by requiring memory provision for the additional temporal information. Crowd video analysis is one of the subareas in video analysis that has recently gained notoriety. In this paper we have shown that a 2D CNN can be used to classify videos by using 3-channel image map input for each video computed using spatial and temporal information and this reduces space and time complexity over a classical 3D CNN usually used for video analysis. We test the model developed with the state-of-the-art method of [1] using their proposed dataset, and without any additional processing steps, improve upon their reported accuracy.

[1]  Xiaogang Wang,et al.  Profiling stationary crowd groups , 2014, 2014 IEEE International Conference on Multimedia and Expo (ICME).

[2]  Shaogang Gong,et al.  Crowd Counting and Profiling: Methodology and Evaluation , 2013, Modeling, Simulation and Visual Analysis of Crowds.

[3]  Ivan Laptev,et al.  Data-driven crowd analysis in videos , 2011, ICCV.

[4]  Sergio A. Velastin,et al.  Crowd analysis: a survey , 2008, Machine Vision and Applications.

[5]  Wen-Hsien Fang,et al.  Abnormal crowd behavior detection and localization using maximum sub-sequence search , 2013, ARTEMIS '13.

[6]  Nuno Vasconcelos,et al.  Anomaly Detection and Localization in Crowded Scenes , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Peng Wang,et al.  Temporal Pyramid Pooling-Based Convolutional Neural Network for Action Recognition , 2015, IEEE Transactions on Circuits and Systems for Video Technology.

[8]  Jean-Luc Dugelay,et al.  Sparse Feature Tracking for Crowd Change Detection and Event Recognition , 2014, 2014 22nd International Conference on Pattern Recognition.

[9]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Robert T. Collins,et al.  Vision-Based Analysis of Small Groups in Pedestrian Crowds , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Xiaogang Wang,et al.  Scene-Independent Group Profiling in Crowd , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Matthew J. Hausknecht,et al.  Beyond short snippets: Deep networks for video classification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Xiaogang Wang,et al.  Fully Convolutional Neural Networks for Crowd Segmentation , 2014, ArXiv.

[15]  Nitish Srivastava,et al.  Exploiting Image-trained CNN Architectures for Unconstrained Video Classification , 2015, BMVC.

[16]  Junmo Kim,et al.  Image Classification Using Convolutional Neural Networks With Multi-stage Feature , 2014, RiTA.

[17]  Xiaogang Wang,et al.  Learning Scene-Independent Group Descriptors for Crowd Understanding , 2017, IEEE Transactions on Circuits and Systems for Video Technology.

[18]  Ivan Laptev,et al.  Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Xiaogang Wang,et al.  Slicing Convolutional Neural Network for Crowd Video Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Mubarak Shah,et al.  Abnormal crowd behavior detection using social force model , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Vittorio Murino,et al.  Joint Individual-Group Modeling for Tracking , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[23]  Vittorio Murino,et al.  Decentralized particle filter for joint individual-group tracking , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Xiaogang Wang,et al.  Deeply learned attributes for crowded scene understanding , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).