论文信息 - Dual attention convolutional network for action recognition

Dual attention convolutional network for action recognition

Action recognition has been an active research area for many years. Extracting discriminative spatial and temporal features of different actions plays a key role in accomplishing this task. Current popular methods of action recognition are mainly based on two-stream Convolutional Networks (ConvNets) or 3D ConvNets. However, the computational cost of two-stream ConvNets is high for the requirement of optical flow while 3D ConvNets takes too much memory because they have a large amount of parameters. To alleviate such problems, the authors propose a Dual Attention ConvNet (DANet) based on dual attention mechanism which consists of spatial attention and temporal attention. The former concentrates on main motion objects in a video frame by using ConvNet structure and the latter captures related information of multiple video frames by adopting self-attention. Their network is entirely based on 2D ConvNet and takes in only RGB frames. Experimental results on UCF-101 and HMDB-51 benchmarks demonstrate that DANet gets comparable results among leading methods, which proves the effectiveness of the dual attention mechanism.

[1] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[2] Ming Yang,et al. 3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3] Lihi Zelnik-Manor,et al. Context-Aware Saliency Detection , 2012, IEEE Trans. Pattern Anal. Mach. Intell..

[4] Cees Snoek,et al. VideoLSTM convolves, attends and flows for action recognition , 2016, Comput. Vis. Image Underst..

[5] Ivan Laptev,et al. On Space-Time Interest Points , 2005, International Journal of Computer Vision.