A Multi-Level Network for Human Pose Estimation

Although multi-person human pose estimation has made great progress in recent years, the challenges such as various scales of persons, occluded keypoints, and crowded backgrounds in complex scenes are still remained to be solved. In this paper, we propose a novel multi-level pose estimation network (MLPE) to learn multi-level features that can preserve both the strong semantic clues and spatial resolution for keypoint prediction and location. More specifically, a multi-level prediction network with a feature enhancement strategy is first proposed to learn multi-level features to achieve a good trade-off between the global context information and spatial resolution. We then build a high-resolution fine network to restore high spatial resolution information based on transposed convolutions to accurately locate the keypoints. We have conducted extensive experiments on the challenging MS COCO dataset, which has proved the effectiveness of our proposed method. Code † and the experimental results are publicly online available for further research.