Exploring Techniques to Improve Activity Recognition using Human Pose Skeletons

Human pose skeletons provide an explainable representation of the orientation of a person. Neural network architectures such as OpenPose can estimate the 2D human pose skeletons of people present in an image with good accuracy. Naturally, the human pose is a very attractive choice as a representation for building systems aimed at human activity recognition. However, raw pose keypoint representations suffer from various problems such as variance to translation and scale of the input images. Keypoints are also often missed by the pose estimation framework. These, and other factors lead to poor generalization and learning of networks that may be trained directly on these raw representations. This paper introduces various methods aimed at building a robust representation for training models related to activity recognition tasks, such as the usage of handcrafted features extracted from poses with the intent of introducing scale and translation invariance. Additionally, the usage of train-time techniques such as keypoint dropout are explored to facilitate better learning of models. Finally, we conduct an ablation study comparing the performance of deep learning models trained on raw keypoint representation and handcrafted features whilst incorporating our train-time techniques to quantify the effectiveness of our introduced methods over raw representations.

[1]  Limin Wang,et al.  Temporal Segment Networks for Action Recognition in Videos , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Trevor Darrell,et al.  Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Devendra Patil,et al.  Eye in the Sky: Real-Time Drone Surveillance System (DSS) for Violent Individuals Identification Using ScatterNet Hybrid Deep Learning Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[4]  Guanghan Ning,et al.  LightTrack: A Generic Framework for Online Top-Down Human Pose Tracking , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[5]  David Picard,et al.  2D/3D Pose Estimation and Action Recognition Using Multitask Deep Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6]  Viacheslav V. Voronin,et al.  Classification of a two-dimensional pose using a human skeleton , 2017 .

[7]  Peter V. Gehler,et al.  DeepCut: Joint Subset Partition and Labeling for Multi Person Pose Estimation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Gang Yu,et al.  Cascaded Pyramid Network for Multi-person Pose Estimation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[10]  Yichen Wei,et al.  Simple Baselines for Human Pose Estimation and Tracking , 2018, ECCV.

[11]  Cewu Lu,et al.  RMPE: Regional Multi-person Pose Estimation , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[12]  Nojun Kwak,et al.  Pose estimator and tracker using temporal flow maps for limbs , 2019, 2019 International Joint Conference on Neural Networks (IJCNN).

[13]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Gloria Haro,et al.  Single-Camera Basketball Tracker through Pose and Semantic Feature Fusion , 2019, ArXiv.

[15]  Mubarak Shah,et al.  Recognizing 50 human action categories of web videos , 2012, Machine Vision and Applications.

[16]  Jonathan Tompson,et al.  Towards Accurate Multi-person Pose Estimation in the Wild , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Vincent Lepetit,et al.  BRIEF: Binary Robust Independent Elementary Features , 2010, ECCV.

[18]  Ieee Xplore,et al.  IEEE Transactions on Pattern Analysis and Machine Intelligence Information for Authors , 2022, IEEE Transactions on Pattern Analysis and Machine Intelligence.