Deep Learning Video Action Recognition Method Based on Key Frame Algorithm

In order to solve the problem of action recognition in short video and capture the key information of video, this paper first proposes a KGAF-means method for key frame extraction. The KGAF-means method is based on the clustering principle and combines the K-means algorithm with the artificial fish swarm algorithm to realize the key frame sequence extraction. Based on the extracted key frame sequence, the RGB image and the optical flow image are separately extracted by the improved dual-stream variable convolution network. Then, using the cascading method, the image feature vector and the optical flow feature vector are fused to obtain the fused feature vector for action recognition. The selected data set is the Charades data set. The experimental results show that the mAP value of the method is 22.9 on the public dataset Charades. And the results show that the proposed method has better robustness than other network models and improves the short video action recognition effect.

[1]  V. V. Yashina,et al.  A new method for automating the investigation of stem cell populations based on the analysis of the integral optical flow of a video sequence , 2017, Pattern Recognition and Image Analysis.

[2]  Tsuhan Chen,et al.  Motion-focusing key frame extraction and video summarization for lane surveillance system , 2009, 2009 16th IEEE International Conference on Image Processing (ICIP).

[3]  Mohammed Javed,et al.  An efficient method for video shot boundary detection and keyframe extraction using SIFT-point distribution histogram , 2016, International Journal of Multimedia Information Retrieval.

[4]  Jianmin Jiang,et al.  A novel clustering method for static video summarization , 2017, Multimedia Tools and Applications.

[5]  Tianming Liu,et al.  A novel video key-frame-extraction algorithm based on perceived motion energy model , 2003, IEEE Trans. Circuits Syst. Video Technol..

[6]  Yang Yi,et al.  Key frame extraction based on visual attention model , 2012, J. Vis. Commun. Image Represent..

[7]  Yueting Zhuang,et al.  Adaptive key frame extraction using unsupervised clustering , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[8]  Jiliu Zhou,et al.  Text Detection and Recognition for Natural Scene Images Using Deep Convolutional Neural Networks , 2019 .

[9]  Andrew Zisserman,et al.  Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[10]  Sung Wook Baik,et al.  Action Recognition in Video Sequences using Deep Bi-Directional LSTM With CNN Features , 2018, IEEE Access.

[11]  Marco Pellegrini,et al.  STIMO: STIll and MOving video storyboard for the web scenario , 2009, Multimedia Tools and Applications.

[12]  Huayong Liu,et al.  Key frame extraction based on improved hierarchical clustering algorithm , 2014, 2014 11th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD).

[13]  Nishchal K. Verma,et al.  Generation of future image frames using optical flow , 2013, 2013 IEEE Applied Imagery Pattern Recognition Workshop (AIPR).

[14]  Ahmed Farouk,et al.  A new general model for quantum image histogram (QIH) , 2019, Quantum Inf. Process..

[15]  Cao Jun,et al.  Application of Self-Organizing Feature Map Neural Network Based on K-means Clustering in Network Intrusion Detection , 2019 .

[16]  Stephen W. Smoliar,et al.  An integrated system for content-based video retrieval and browsing , 1997, Pattern Recognit..

[17]  Ravi Iyer,et al.  Adaptive Keyframe Selection for Video Summarization , 2015, 2015 IEEE Winter Conference on Applications of Computer Vision.