论文信息 - Towards Fast and Accurate Multi-Person Pose Estimation on Mobile Devices

Towards Fast and Accurate Multi-Person Pose Estimation on Mobile Devices

The rapid development of autonomous driving, abnormal behavior detection, and behavior recognition makes an increasing demand for multi-person pose estimation-based applications, especially on mobile platforms. However, to achieve high accuracy, state-of-the-art methods tend to have a large model size and complex post-processing algorithm, which costs intense computation and long end-toend latency. To solve this problem, we propose an architecture optimization and weight pruning framework to accelerate inference of multi-person pose estimation on mobile devices. With our optimization framework, we achieve up to 2.51× faster model inference speed with higher accuracy compared to representative lightweight multi-person pose estimator.

[1] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.

[2] Dingwen Tao,et al. ClickTrain: efficient and accurate end-to-end deep learning training via fine-grained architecture-preserving pruning , 2021, ICS.

[3] P. Cochat,et al. Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[4] Cewu Lu,et al. RMPE: Regional Multi-person Pose Estimation , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[5] Ping Liu,et al. Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6] HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7] Xue Lin,et al. Real-Time Mobile Acceleration of DNNs: From Computer Vision to Medical Applications , 2021, 2021 26th Asia and South Pacific Design Automation Conference (ASP-DAC).

[8] Yanzhi Wang,et al. YOLObile: Real-Time Object Detection on Mobile Devices via Compression-Compilation Co-Design , 2020, AAAI.

[9] Jonathan Tompson,et al. PersonLab: Person Pose Estimation and Instance Segmentation with a Bottom-Up, Part-Based, Geometric Embedding Model , 2018, ECCV.

[10] Yanzhi Wang,et al. Achieving Real-Time Execution of Transformer-based Large-scale Models on Mobile with Compiler-aware Neural Architecture Optimization , 2020, ArXiv.

[11] Zhiao Huang,et al. Associative Embedding: End-to-End Learning for Joint Detection and Grouping , 2016, NIPS.

[12] Dima Damen,et al. Recognizing linked events: Searching the space of feasible explanations , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[13] Yaser Sheikh,et al. OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14] Daniil Osokin,et al. Real-time 2D Multi-Person Pose Estimation on CPU: Lightweight OpenPose , 2018, ICPRAM.