Body-Part-Aware and Multitask-Aware Single-Image-Based Action Recognition

Action recognition is an application that, ideally, requires real-time results. We focus on single-image-based action recognition instead of video-based because of improved speed and lower cost of computation. However, a single image contains limited information, which makes single-image-based action recognition a difficult problem. To get an accurate representation of action classes, we propose three feature-stream-based shallow sub-networks (image-based, attention-image-based, and part-image-based feature networks) on the deep pose estimation network in a multitasking manner. Moreover, we design the multitask-aware loss function, so that the proposed method can be adaptively trained with heterogeneous datasets where only human pose annotations or action labels are included (instead of both pose and action information), which makes it easier to apply the proposed approach to new data on behavioral analysis on intelligent systems. In our extensive experiments, we showed that these streams represent complementary information and, hence, the fused representation is robust in distinguishing diverse fine-grained action classes. Unlike other methods, the human pose information was trained using heterogeneous datasets in a multitasking manner; nevertheless, it achieved 91.91% mean average precision on the Stanford 40 Actions Dataset. Moreover, we demonstrated the proposed method can be flexibly applied to multi-labels action recognition problem on the V-COCO Dataset.

[1]  Cho Nilar Phyo,et al.  Complex Human–Object Interactions Analyzer Using a DCNN and SVM Hybrid Approach , 2019 .

[2]  Fahad Shahbaz Khan,et al.  Recognizing Actions Through Action-Specific Person Detection , 2015, IEEE Transactions on Image Processing.

[3]  Minsik Lee,et al.  Building a Compact Convolutional Neural Network for Embedded Intelligent Sensor Systems Using Group Sparsity and Knowledge Distillation , 2019, Sensors.

[4]  Geonho Cha,et al.  Deep Pose Consensus Networks , 2018, Comput. Vis. Image Underst..

[5]  Rama Chellappa,et al.  HyperFace: A Deep Multi-Task Learning Framework for Face Detection, Landmark Localization, Pose Estimation, and Gender Recognition , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Jianfei Cai,et al.  Action Recognition in Still Images With Minimum Annotation Efforts , 2016, IEEE Transactions on Image Processing.

[7]  Songhwai Oh,et al.  Single image 3D human pose estimation using a procrustean normal distribution mixture model and model transformation , 2017, Comput. Vis. Image Underst..

[8]  Guodong Guo,et al.  A survey on still image based human action recognition , 2014, Pattern Recognit..

[9]  Sang-Woong Lee,et al.  Multiple Human Detection and Tracking Based on Weighted Temporal Texture Features , 2006, Int. J. Pattern Recognit. Artif. Intell..

[10]  Jun Zhang,et al.  Attend It Again: Recurrent Attention Convolutional Neural Network for Action Recognition , 2018 .

[11]  Huimin Ma,et al.  Semantic parts based top-down pyramid for action recognition , 2016, Pattern Recognit. Lett..