Multi-Streams Network for Action Recognition

In this paper, combined with depth information and rich texture information in RGB video sequences, a multi-stream behavior recognition algorithm is proposed. The algorithm uses DenseNet as the main network to obtain color texture information, optical flow information and depth information, and uses them as input to the information flow network. Then use LSTMs for feature fusion and behavior classification. Through experiments, the recognition rate on the UTD-MHAD data set of the public action recognition library is 93.88%, which is significantly better than similar algorithms.

[1]  Anton van den Hengel,et al.  Wider or Deeper: Revisiting the ResNet Model for Visual Recognition , 2016, Pattern Recognit..

[2]  Andrew Zisserman,et al.  Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[3]  Xiaodong Yang,et al.  Recognizing actions using depth motion maps-based histograms of oriented gradients , 2012, ACM Multimedia.

[4]  Nasser Kehtarnavaz,et al.  UTD-MHAD: A multimodal dataset for human action recognition utilizing a depth camera and a wearable inertial sensor , 2015, 2015 IEEE International Conference on Image Processing (ICIP).

[5]  Pichao Wang,et al.  Skeleton Optical Spectra-Based Action Recognition Using Convolutional Neural Networks , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[6]  Shuang Wang,et al.  Structured Images for RGB-D Action Recognition , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[7]  Matthew J. Hausknecht,et al.  Beyond short snippets: Deep networks for video classification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Nasser Kehtarnavaz,et al.  Real-Time Continuous Detection and Recognition of Subject-Specific Smart TV Gestures via Fusion of Depth and Inertial Sensing , 2018, IEEE Access.

[9]  Horst Bischof,et al.  A Duality Based Approach for Realtime TV-L1 Optical Flow , 2007, DAGM-Symposium.

[10]  Chokri Ben Amar,et al.  Semantic segmentation using reinforced fully convolutional densenet with multiscale kernel , 2019, Multimedia Tools and Applications.

[11]  Edwin Escobedo,et al.  A New Approach for Dynamic Gesture Recognition Using Skeleton Trajectory Representation and Histograms of Cumulative Magnitudes , 2016, 2016 29th SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI).

[12]  Trevor Darrell,et al.  Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Ghassan Al-Regib,et al.  TS-LSTM and Temporal-Inception: Exploiting Spatiotemporal Dynamics for Activity Recognition , 2017, Signal Process. Image Commun..

[14]  Jinwen Ma,et al.  DMMs-Based Multiple Features Fusion for Human Action Recognition , 2015, Int. J. Multim. Data Eng. Manag..