An improved hybridized deep structured model for accurate video event recognition

Video event recognition plays an important role in the various research fields particularly in surveillance detection system. In the existing system it is done by deep hierarchical context model which utilizes several contextual data which contains various context information at level of feature, semantic and priority for the video recognition process. However, this research method might perform low with increased volume of videos and might be failed to predict the events accurately with less interrelation contextual features. The standstill challenges are solved via improved hybridized deep structured model. The three primary features of contextual data are to discriminate resultant neighborhood. Here hybrid textual perceptual descriptor and concept-based attribute extraction is performed for accurate recognition of video events. These extracted interaction context features are grouped by using improved K means algorithm. In addition, improved deep structured model that combines convolutional neural networks and conditional random fields are developed for learning middle level representations and mingle the bottom feature level, mid-semantic and top-level meanings for the identification of incidents. This proposed research method is evaluated by using VIRAT data set whose simulation analysis is performed using Matlab simulation toolkit. The overall evaluation of the proposed research method proves that the suggested method can provide better output in terms of accurate recognition of events.

[1]  Jonathan G. Fiscus,et al.  TRECVID 2016: Evaluating Video Search, Video Event Detection, Localization, and Hyperlinking , 2016, TRECVID.

[2]  Cordelia Schmid,et al.  Activity representation with motion hierarchies , 2013, International Journal of Computer Vision.

[3]  S. Subashini,et al.  Particle bee optimized convolution neural network for managing security using cross-layer design in cognitive radio network , 2018 .

[4]  Tao Mei,et al.  Super Fast Event Recognition in Internet Videos , 2015, IEEE Transactions on Multimedia.

[5]  Larry S. Davis,et al.  Objects in Action: An Approach for Combining Action Understanding and Object Perception , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Bernard Ghanem,et al.  ActivityNet: A large-scale video benchmark for human activity understanding , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Ramakant Nevatia,et al.  ACTIVE: Activity Concept Transitions in Video Event Classification , 2013, 2013 IEEE International Conference on Computer Vision.

[8]  Fei-Fei Li,et al.  Video Event Understanding Using Natural Language Descriptions , 2013, 2013 IEEE International Conference on Computer Vision.

[9]  Qiang Ji,et al.  Video event recognition with deep hierarchical context model , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Fei-Fei Li,et al.  What, where and who? Classifying events by scene and object recognition , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[11]  Xianghua Xie,et al.  From pose to activity: Surveying datasets and introducing CONVERSE , 2015, Comput. Vis. Image Underst..

[12]  Kunihiko Fukushima,et al.  Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position , 1980, Biological Cybernetics.

[13]  D. Hubel,et al.  Receptive fields, binocular interaction and functional architecture in the cat's visual cortex , 1962, The Journal of physiology.

[14]  Jintao Li,et al.  Hierarchical spatio-temporal context modeling for action recognition , 2009, CVPR.

[15]  R. Kousalya,et al.  Multiple Video Instance Detection and Retrieval using Spatio-Temporal Analysis using Semi Supervised SVM Algorithm , 2017 .

[16]  JianXin Song,et al.  Human Action Recognition based on Convolutional Neural Networks with a Convolutional Auto-Encoder , 2016 .

[17]  Fei-Fei Li,et al.  Modeling mutual context of object and human pose in human-object interaction activities , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[18]  Haroon Idrees,et al.  The THUMOS challenge on action recognition for videos "in the wild" , 2016, Comput. Vis. Image Underst..

[19]  Varsha Hemant Patil,et al.  A Study of Vision based Human Motion Recognition and Analysis , 2016, Int. J. Ambient Comput. Intell..

[20]  Razvan Pascanu,et al.  Interaction Networks for Learning about Objects, Relations and Physics , 2016, NIPS.

[21]  Mykola Pechenizkiy,et al.  A survey on using domain and contextual knowledge for human activity recognition in video streams , 2016, Expert Syst. Appl..

[22]  S. McGill,et al.  FMS Scores Change With Performers' Knowledge of the Grading Criteria—Are General Whole-Body Movement Screens Capturing “Dysfunction”? , 2013, Journal of strength and conditioning research.

[23]  Xiaogang Wang,et al.  Multi-stage Contextual Deep Learning for Pedestrian Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[24]  Ying Wu,et al.  Action recognition with multiscale spatio-temporal contexts , 2011, CVPR 2011.

[25]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[26]  Mubarak Shah,et al.  Recognizing Complex Events Using Large Margin Joint Low-Level Event Model , 2012, ECCV.

[27]  S. Ravi,et al.  Detection of masses in digital mammograms using K-means and neural network , 2015 .

[28]  Zhide Chen,et al.  Distracted driving recognition method based on deep convolutional neural network , 2019 .

[29]  Amit K. Roy-Chowdhury,et al.  Context-Aware Modeling and Recognition of Activities in Video , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.