EfficientPose: Scalable single-person pose estimation

Human pose estimation facilitates markerless movement analysis in sports, as well as in clinical applications. Still, state-of-the-art models for human pose estimation generally do not meet the requirements for real-life deployment. The main reason for this is that the more the field progresses, the more expensive the approaches become, with high computational demands. To cope with the challenges caused by this trend, we propose a convolutional neural network architecture that benefits from the recently proposed EfficientNets to deliver scalable single-person pose estimation. To this end, we introduce EfficientPose, which is a family of models harnessing an effective multi-scale feature extractor, computation efficient detection blocks utilizing mobile inverted bottleneck convolutions, and upscaling improving precision of pose configurations. EfficientPose enables real-world deployment on edge devices through 500K parameter model consuming less than one GFLOP. The results from our experiments, using the challenging MPII single-person benchmark, show that the proposed EfficientPose models substantially outperform the widely-used OpenPose model in terms of accuracy, while being at the same time up to 15 times smaller and 20 times more computationally efficient than its counterpart.

[1]  Christian Theobalt,et al.  Single-Shot Multi-person 3D Pose Estimation from Monocular RGB , 2017, 2018 International Conference on 3D Vision (3DV).

[2]  Quoc V. Le,et al.  Swish: a Self-Gated Activation Function , 2017, 1710.05941.

[3]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[4]  Ming Ye,et al.  Improvement Multi-Stage Model for Human Pose Estimation , 2019, ArXiv.

[5]  Ilya Kostrikov,et al.  An Efficient Convolutional Network for Human Pose Estimation , 2016, BMVC.

[6]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Sergey Ioffe,et al.  Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[8]  Quoc V. Le,et al.  EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.

[9]  S. Natarajan,et al.  Activation Function Optimizations for Capsule Networks , 2018, 2018 International Conference on Advances in Computing, Communications and Informatics (ICACCI).

[10]  Ruigang Yang,et al.  Human Pose Estimation with Spatial Contextual Information , 2019, ArXiv.

[11]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Maja Pantic,et al.  Toward fast and accurate human pose estimation via soft-gated skip connections , 2020, 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020).

[13]  Oishee Mazumder,et al.  Hand Gesture Recognition Based Omnidirectional Wheelchair Control Using IMU and EMG Sensors , 2018, J. Intell. Robotic Syst..

[14]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[15]  Honggang Qi,et al.  Multi-Scale Structure-Aware Network for Human Pose Estimation , 2018, ECCV.

[16]  Bernt Schiele,et al.  2D Human Pose Estimation: New Benchmark and State of the Art Analysis , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Giovanni Maria Farinella,et al.  On the Estimation of Children's Poses , 2017, ICIAP.

[18]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[19]  Vijay Vasudevan,et al.  Learning Transferable Architectures for Scalable Image Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[20]  Leslie N. Smith,et al.  Cyclical Learning Rates for Training Neural Networks , 2015, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[21]  Mao Ye,et al.  Fast Human Pose Estimation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Xiaogang Wang,et al.  Learning Feature Pyramids for Human Pose Estimation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[23]  Nauman Aslam,et al.  High-speed multi-person pose estimation with deep feature transfer , 2020, Comput. Vis. Image Underst..

[24]  Q. Pham,et al.  Single-shot 3D multi-person pose estimation in complex images , 2021 .

[25]  Jia Deng,et al.  Stacked Hourglass Networks for Human Pose Estimation , 2016, ECCV.

[26]  Quoc V. Le,et al.  Searching for Activation Functions , 2018, arXiv.

[27]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[28]  Jonathan Tompson,et al.  Joint Training of a Convolutional Network and a Graphical Model for Human Pose Estimation , 2014, NIPS.

[29]  Peter V. Gehler,et al.  Poselet Conditioned Pictorial Structures , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[31]  Eric Alcaide,et al.  E-swish: Adjusting Activations to Different Network Depths , 2018, ArXiv.

[32]  Ming Ye,et al.  Cascade Feature Aggregation for Human Pose Estimation , 2019, 1902.07837.

[33]  Ying Wu,et al.  Deeply Learned Compositional Models for Human Pose Estimation , 2018, ECCV.

[34]  Quoc V. Le,et al.  MixConv: Mixed Depthwise Convolutional Kernels , 2019, BMVC.

[35]  Jim Tørresen,et al.  A Robust Human Activity Recognition Approach Using OpenPose, Motion Features, and Deep Recurrent Neural Network , 2019, SCIA.

[36]  Gang Sun,et al.  Squeeze-and-Excitation Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[37]  Caterina Rizzi,et al.  A New Approach for Medical Assessment of Patient’s Injured Shoulder , 2019 .

[38]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Christian Szegedy,et al.  DeepPose: Human Pose Estimation via Deep Neural Networks , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[40]  Greg Mori,et al.  CLIP-Q: Deep Network Compression Learning by In-parallel Pruning-Quantization , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[41]  Xiaogang Wang,et al.  Multi-context Attention for Human Pose Estimation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Rich Caruana,et al.  Model compression , 2006, KDD '06.

[43]  Junhui Hou,et al.  Single image-based head pose estimation with spherical parametrization and 3D morphing , 2020, Pattern Recognit..

[44]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[45]  Nicholay Topin,et al.  Super-convergence: very fast training of neural networks using large learning rates , 2018, Defense + Commercial Sensing.

[46]  Bernt Schiele,et al.  Learning to Refine Human Pose Estimation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[47]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Fei Wang,et al.  Siamese Attentional Keypoint Network for High Performance Visual Tracking , 2019, Knowl. Based Syst..

[49]  Michele Nappi,et al.  Gait Analysis for Gender Classification in Forensics , 2019, DependSys.

[50]  Erdefi Rakun,et al.  Recognizing fingerspelling in SIBI (sistem isyarat bahasa Indonesia) using OpenPose and elliptical fourier descriptor , 2019, AISS '19.

[51]  Bo Chen,et al.  MnasNet: Platform-Aware Neural Architecture Search for Mobile , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Dimitris N. Metaxas,et al.  CU-Net: Coupled U-Nets , 2018, BMVC.

[53]  Hideki Murakoshi,et al.  Prediction of Basketball Free Throw Shooting by OpenPose , 2018, JSAI-isAI Workshops.

[54]  Shimon Ullman,et al.  Human Pose Estimation Using Deep Consensus Voting , 2016, ECCV.

[55]  Erich Elsen,et al.  Fast Sparse ConvNets , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[56]  Catherine Achard,et al.  Deep, robust and single shot 3D multi-person human pose estimation in complex images , 2019, ArXiv.

[57]  Sridha Sridharan,et al.  Tracking by Prediction: A Deep Generative Model for Mutli-person Localisation and Tracking , 2018, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[58]  Peng Gao,et al.  Learning Reinforced Attentional Representation for End-to-End Visual Tracking , 2019, Inf. Sci..

[59]  Helen Loeb,et al.  Computer vision to automatically assess infant neuromotor risk , 2019, bioRxiv.

[60]  Mao Ye,et al.  Distribution-Aware Coordinate Representation for Human Pose Estimation , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[61]  Maja Pantic,et al.  Improved training of binary networks for human pose estimation and image recognition , 2019, ArXiv.

[62]  Dong Liu,et al.  Deep High-Resolution Representation Learning for Human Pose Estimation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[63]  Shu Wang,et al.  Learning structures of interval-based Bayesian networks in probabilistic generative model for human complex activity recognition , 2018, Pattern Recognit..

[64]  Chen-zhi Guan,et al.  Realtime Multi-Person 2D Pose Estimation using ShuffleNet , 2019, 2019 14th International Conference on Computer Science & Education (ICCSE).

[65]  Hamido Fujita,et al.  Unsupervised emotional state classification through physiological parameters for social robotics applications , 2020, Knowl. Based Syst..