Towards Intelligent Crowd Behavior Understanding Through the STFD Descriptor Exploration

Realizing the automated and online detection of crowd anomalies from surveillance CCTVs is a research-intensive and application-demanding task. This research proposes a novel technique for detecting crowd abnormalities through analyzing the spatial and temporal features of input video signals. This integrated solution defines an image descriptor (named spatio-temporal feature descriptor—STFD) that reflects the global motion pattern of crowds over time. A designed convolutional neural network (CNN) has then been adopted to classify dominant or large-scale crowd abnormal behaviors. The work reported has focused on: (1) detecting moving objects in online (or near real-time) manner through spatio-temporal segmentations of crowds identified by the similarity of group trajectory structures in the temporal space and the foreground blocks based on the Gaussian mixture model in the spatial space; (2) dividing multiple clustered groups based on the spectral clustering methods through treating image pixels from segmented regions as dynamic particles; (3) creating STFD descriptor instances by calculating corresponding attributes such as collectiveness, stability, conflict and crowd density for individuals (particles) in the corresponding groups; (4) inputting generated STFD descriptor instances into the devised CNN to detect suspicious crowd behaviors. For the test and evaluation of the devised models and techniques, the PETS database has been selected as the primary experimental data sets. Results against benchmarking models and systems have shown promising advancements of this novel approach in terms of accuracy and efficiency for crowd anomaly detection.

[1]  J. Ferryman,et al.  PETS2009: Dataset and challenge , 2009, 2009 Twelfth IEEE International Workshop on Performance Evaluation of Tracking and Surveillance.

[2]  Xiaogang Wang,et al.  Coherent Filtering: Detecting Coherent Motions from Crowd Clutters , 2012, ECCV.

[3]  Qi Wang,et al.  Online Anomaly Detection in Crowd Scenes via Structure Analysis , 2015, IEEE Transactions on Cybernetics.

[4]  Hans-Peter Seidel,et al.  Coherent Spatiotemporal Filtering, Upsampling and Rendering of RGBZ Videos , 2012, Comput. Graph. Forum.

[5]  Robert B. Fisher,et al.  Modelling Crowd Scenes for Event Detection , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[6]  Nitish Srivastava,et al.  Exploiting Image-trained CNN Architectures for Unconstrained Video Classification , 2015, BMVC.

[7]  Xiaogang Wang,et al.  Understanding collective crowd behaviors: Learning a Mixture model of Dynamic pedestrian-Agents , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Soraia Raupp Musse,et al.  Crowd Analysis Using Computer Vision Techniques , 2010, IEEE Signal Processing Magazine.

[9]  Zhe Wu,et al.  Motion pattern analysis in crowded scenes based on hybrid generative-discriminative feature maps , 2013, 2013 IEEE International Conference on Image Processing.

[10]  R. Dahrendorf Toward a theory of social conflict , 1958 .

[11]  Bingbing Ni,et al.  Crowded Scene Analysis: A Survey , 2015, IEEE Transactions on Circuits and Systems for Video Technology.

[12]  Nasser M. Nasrabadi,et al.  Graph-Based Sensor Fusion for Classification of Transient Acoustic Signals , 2015, IEEE Transactions on Cybernetics.

[13]  Zhang Xuegong,et al.  INTRODUCTION TO STATISTICAL LEARNING THEORY AND SUPPORT VECTOR MACHINES , 2000 .

[14]  Alessandro Perina,et al.  Detecting Abnormal Behavioral Patterns in Crowd Scenarios , 2016, Toward Robotic Socially Believable Behaving Systems.

[15]  Ivan Laptev,et al.  Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[17]  Junmo Kim,et al.  Image Classification Using Convolutional Neural Networks With Multi-stage Feature , 2014, RiTA.

[18]  Xiaogang Wang,et al.  Learning Scene-Independent Group Descriptors for Crowd Understanding , 2017, IEEE Transactions on Circuits and Systems for Video Technology.

[19]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[20]  S. Wheelan The Handbook of Group Research and Practice , 2005 .

[21]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[22]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[23]  Berthold K. P. Horn,et al.  Determining Optical Flow , 1981, Other Conferences.

[24]  Rongrong Ji,et al.  Social Attribute-Aware Force Model: Exploiting Richness of Interaction for Abnormal Crowd Detection , 2015, IEEE Transactions on Circuits and Systems for Video Technology.

[25]  Sergio A. Velastin,et al.  Crowd monitoring using image processing , 1995 .

[26]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  J.-Y. Bouguet,et al.  Pyramidal implementation of the lucas kanade feature tracker , 1999 .

[28]  Mubarak Shah,et al.  Abnormal crowd behavior detection using social force model , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Shuang Wu,et al.  Crowd semantic segmentation based on spatial-temporal dynamics , 2016, 2016 13th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[30]  Thomas Brox,et al.  High Accuracy Optical Flow Estimation Based on a Theory for Warping , 2004, ECCV.

[31]  W. Kalender X-ray computed tomography , 2006, Physics in medicine and biology.

[32]  Xiaogang Wang,et al.  DeepID-Net: multi-stage and deformable deep convolutional neural networks for object detection , 2014, ArXiv.