Real time violence detection framework for football stadium comprising of big data analysis and deep learning through bidirectional LSTM

Abstract Football is the most popular sport in the world with four billion fans all over the world. Reportedly, the violence incidence rates are high during or after the matches. The violent or destructive behavior carried out by a person or player, who watches or plays the game in the stadium is known as football hooliganism. To prevent or control the violence, a real time violence detection system is exclusively needed to monitor the behavior of the crowd and players to take necessary action before the violence is about to happen. Even it is necessary for the system to find whether the attack is non-intentional or intentional in the game. In this paper, a real time violence detection system is proposed which processes the huge input streaming data and recognize the violence with human intelligence simulation. The input to the system is the enormous amount of real time video streams from different sources which is processed in Spark framework. In the Spark framework, the frames are separated and the features of individual frames are extracted by using HOG (Histogram of Oriented Gradients) function. Then the frames are labeled based on features as violence model, human part model and negative model, which are used to train the Bidirectional Long Short-Term Memory (BDLSTM) network for recognition of violence scenes. The bidirectional LSTM can access the information both in forward and reverse direction. Thus the output is generated in context to both past and future information. The network is trained with the violent interaction dataset (VID), containing 2314 videos with 1077 fight ones and 1237 no-fight ones. Moreover to make the model robust to violence detection, we have created a dataset with 410 video clips having non-violence scenes and 409 video clips having violence scenes, acquired from the football stadium. The performance of this model is validated and it proves the sturdiness of the system with an accuracy of 94.5 percentage in recognizing the violent action.

[1]  Dong Yu,et al.  Scalable stacking and learning for building deep architectures , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  Alexander G. Hauptmann,et al.  MoSIFT: Recognizing Human Actions in Surveillance Videos , 2009 .

[3]  Samy Bengio,et al.  Modeling High-Dimensional Discrete Data with Multi-Layer Neural Networks , 1999, NIPS.

[4]  Jitendra Malik,et al.  Recognizing action at a distance , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[5]  Mohamed Abdel-Basset,et al.  A novel method for solving the fully neutrosophic linear programming problems , 2018, Neural Computing and Applications.

[6]  Kunle Olukotun,et al.  Map-Reduce for Machine Learning on Multicore , 2006, NIPS.

[7]  Nitish Srivastava,et al.  Multimodal learning with deep Boltzmann machines , 2012, J. Mach. Learn. Res..

[8]  Huang-Chia Shih,et al.  A Survey of Content-Aware Video Analysis for Sports , 2017, IEEE Transactions on Circuits and Systems for Video Technology.

[9]  Mohamed Abdel-Basset,et al.  Internet of Things (IoT) and its impact on supply chain: A framework for building smart, secure and efficient systems , 2018, Future Gener. Comput. Syst..

[10]  Laurence T. Yang,et al.  A survey on deep learning for big data , 2018, Inf. Fusion.

[11]  Vincent Vanhoucke,et al.  Improving the speed of neural networks on CPUs , 2011 .

[12]  Marc'Aurelio Ranzato,et al.  Sparse Feature Learning for Deep Belief Networks , 2007, NIPS.

[13]  James Martens,et al.  Deep learning via Hessian-free optimization , 2010, ICML.

[14]  Mohamed Abdel-Basset,et al.  Internet of things in smart education environment: Supportive framework in the decision‐making process , 2019, Concurr. Comput. Pract. Exp..

[15]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[16]  Alexander J. Smola,et al.  An architecture for parallel topic models , 2010, Proc. VLDB Endow..

[17]  Pan Feng,et al.  A fast pedestrian detection via modified HOG feature , 2015, 2015 34th Chinese Control Conference (CCC).

[18]  Livier Reithler,et al.  Semantic browsing of video surveillance databases through Online Generic Indexing , 2009, 2009 Third ACM/IEEE International Conference on Distributed Smart Cameras (ICDSC).

[19]  Juhan Nam,et al.  Multimodal Deep Learning , 2011, ICML.

[20]  Zheng Cao,et al.  An efficient video similarity search algorithm , 2010, IEEE Transactions on Consumer Electronics.

[21]  Ivan Laptev,et al.  On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[22]  Roberto J. Bayardo,et al.  MapReduce and Its Application to Massively Parallel Learning of Decision Tree Ensembles , 2011 .

[23]  Mohamed Abdel-Basset,et al.  A hybrid approach of neutrosophic sets and DEMATEL method for developing supplier selection criteria , 2018, Des. Autom. Embed. Syst..

[24]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[25]  Shi wen-wen,et al.  Current Status, Causes and Intervention Strategies of Soccer Violence in Chinese Professional Football League , 2008, 2008 IEEE International Symposium on Knowledge Acquisition and Modeling Workshop.

[26]  Mohamed Abdel-Basset,et al.  A comprehensive review of quadratic assignment problem: variants, hybrids and applications , 2018, Journal of Ambient Intelligence and Humanized Computing.

[27]  Vivienne Sze,et al.  Energy-efficient HOG-based object detection at 1080HD 60 fps with multi-scale support , 2014, 2014 IEEE Workshop on Signal Processing Systems (SiPS).

[28]  Jimmy J. Lin,et al.  Large-scale machine learning at twitter , 2012, SIGMOD Conference.

[29]  Mohamed Abdel-Basset,et al.  A hybrid whale optimization algorithm based on local search strategy for the permutation flow shop scheduling problem , 2018, Future Gener. Comput. Syst..

[30]  Dong Yu,et al.  Tensor Deep Stacking Networks , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Mohamed Abdel-Basset,et al.  2-Levels of clustering strategy to detect and locate copy-move forgery in digital images , 2018, Multimedia Tools and Applications.

[32]  Marc'Aurelio Ranzato,et al.  Learning invariant features through topographic filter maps , 2009, CVPR.

[33]  Christoph Bregler,et al.  Learning and recognizing human dynamics in video sequences , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[34]  Nicola Jones,et al.  Computer science: The learning machines , 2014, Nature.