EfficientPose: Efficient human pose estimation with neural architecture search

Human pose estimation from image and video is a vital task in many multimedia applications. Previous methods achieve great performance but rarely take efficiency into consideration, which makes it difficult to implement the networks on resource-constrained devices. Nowadays real-time multimedia applications call for more efficient models for better interactions. Moreover, most deep neural networks for pose estimation directly reuse the networks designed for image classification as the backbone, which are not yet optimized for the pose estimation task. In this paper, we propose an efficient framework targeted at human pose estimation including two parts, the efficient backbone and the efficient head. By implementing the differentiable neural architecture search method, we customize the backbone network design for pose estimation and reduce the computation cost with negligible accuracy degradation. For the efficient head, we slim the transposed convolutions and propose a spatial information correction module to promote the performance of the final prediction. In experiments, we evaluate our networks on the MPII and COCO datasets. Our smallest model has only 0.65 GFLOPs with 88.1% PCKh@0.5 on MPII and our large model has only 2 GFLOPs while its accuracy is competitive with the state-of-the-art large model, i.e., HRNet with 9.5 GFLOPs.

[1]  Li Fei-Fei,et al.  Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Image Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Kaiming He,et al.  Mask R-CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[3]  Tao Mei,et al.  Customizable Architecture Search for Semantic Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Theodore Lim,et al.  SMASH: One-Shot Model Architecture Search through HyperNetworks , 2017, ICLR.

[5]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6]  Quoc V. Le,et al.  Understanding and Simplifying One-Shot Architecture Search , 2018, ICML.

[7]  Yi Yang,et al.  Articulated pose estimation with flexible mixtures-of-parts , 2011, CVPR 2011.

[8]  Gang Yu,et al.  Cascaded Pyramid Network for Multi-person Pose Estimation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[9]  Peter V. Gehler,et al.  Poselet Conditioned Pictorial Structures , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Georgios Tzimiropoulos,et al.  Binarized Convolutional Landmark Localizers for Human Pose Estimation and Face Alignment with Limited Resources , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[11]  Jia Deng,et al.  Stacked Hourglass Networks for Human Pose Estimation , 2016, ECCV.

[12]  Ameet Talwalkar,et al.  Random Search and Reproducibility for Neural Architecture Search , 2019, UAI.

[13]  Mao Ye,et al.  Fast Human Pose Estimation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Song Han,et al.  ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware , 2018, ICLR.

[15]  Xiaopeng Zhang,et al.  PC-DARTS: Partial Channel Connections for Memory-Efficient Differentiable Architecture Search , 2019, ArXiv.

[16]  Gang Yu,et al.  Rethinking on Multi-Stage Networks for Human Pose Estimation , 2019, ArXiv.

[17]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[18]  Ying Wu,et al.  Deeply Learned Compositional Models for Human Pose Estimation , 2018, ECCV.

[19]  Dong Liu,et al.  Deep High-Resolution Representation Learning for Human Pose Estimation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Xiaogang Wang,et al.  Learning Feature Pyramids for Human Pose Estimation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[21]  Xinggang Wang,et al.  Fast Neural Network Adaptation via Parameter Remapping and Architecture Search , 2020, ICLR.

[22]  Jonathan Tompson,et al.  Towards Accurate Multi-person Pose Estimation in the Wild , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Gaofeng Meng,et al.  EAT-NAS: elastic architecture transfer for accelerating large-scale neural architecture search , 2019, Science China Information Sciences.

[24]  Bernt Schiele,et al.  2D Human Pose Estimation: New Benchmark and State of the Art Analysis , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Jie Tang,et al.  Simple and Lightweight Human Pose Estimation , 2019, ArXiv.

[26]  Zhen Cui,et al.  Pose Neural Fabrics Search , 2019, ArXiv.

[27]  Xiangyu Zhang,et al.  Learning Delicate Local Representations for Multi-Person Pose Estimation , 2020, ECCV.

[28]  Dimitris N. Metaxas,et al.  Quantized Densely Connected U-Nets for Efficient Landmark Localization , 2018, ECCV.

[29]  Vincent Dumoulin,et al.  Deconvolution and Checkerboard Artifacts , 2016 .

[30]  Yiming Yang,et al.  DARTS: Differentiable Architecture Search , 2018, ICLR.

[31]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Yuandong Tian,et al.  FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Christian Szegedy,et al.  DeepPose: Human Pose Estimation via Deep Neural Networks , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Xinggang Wang,et al.  FNA++: Fast Network Adaptation via Parameter Remapping and Architecture Search , 2020, IEEE transactions on pattern analysis and machine intelligence.

[35]  Alok Aggarwal,et al.  Regularized Evolution for Image Classifier Architecture Search , 2018, AAAI.

[36]  Qian Zhang,et al.  Densely Connected Search Space for More Flexible Neural Architecture Search , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Yi Yang,et al.  Searching for a Robust Neural Architecture in Four GPU Hours , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[39]  Sergio Guadarrama,et al.  The Devil is in the Decoder , 2017, BMVC.

[40]  Quoc V. Le,et al.  NAS-FPN: Learning Scalable Feature Pyramid Architecture for Object Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Yichen Wei,et al.  Simple Baselines for Human Pose Estimation and Tracking , 2018, ECCV.

[42]  Hitoshi Kiya,et al.  Checkerboard artifacts free convolutional neural networks , 2019, APSIPA Transactions on Signal and Information Processing.

[43]  Bo Chen,et al.  MnasNet: Platform-Aware Neural Architecture Search for Mobile , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Dacheng Tao,et al.  A Coarse-Fine Network for Keypoint Localization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[45]  Varun Ramakrishna,et al.  Convolutional Pose Machines , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Xiaopeng Zhang,et al.  PC-DARTS: Partial Channel Connections for Memory-Efficient Architecture Search , 2020, ICLR.

[47]  Quoc V. Le,et al.  EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.

[48]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[49]  Vijay Vasudevan,et al.  Learning Transferable Architectures for Scalable Image Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.