HPERL: 3D Human Pose Estimation from RGB and LiDAR

In-the-wild human pose estimation has a huge potential for various fields, ranging from animation and action recognition to intention recognition and prediction for autonomous driving. The current state-of-the-art is focused only on RGB and RGB-D approaches for predicting the 3D human pose. However, not using precise LiDAR depth information limits the performance and leads to very inaccurate absolute pose estimation. With LiDAR sensors becoming more affordable and common on robots and autonomous vehicle setups, we propose an end-to-end architecture using RGB and LiDAR to predict the absolute 3D human pose with unprecedented precision. Additionally, we introduce a weakly-supervised approach to generate 3D predictions using 2D pose annotations from PedX [1]. This allows for many new opportunities in the field of 3D human pose estimation.

[1]  Didier Stricker,et al.  LRPD: Long Range 3D Pedestrian Detection Leveraging Specific Strengths of LiDAR and RGB , 2020, 2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC).

[2]  Wolfram Burgard,et al.  3D Human Pose Estimation in RGBD Images for Robotic Task Learning , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[3]  Peter V. Gehler,et al.  DeepCut: Joint Subset Partition and Labeling for Multi Person Pose Estimation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Deva Ramanan,et al.  3D Human Pose Estimation = 2D Pose Estimation + Matching , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Iasonas Kokkinos,et al.  DensePose: Dense Human Pose Estimation in the Wild , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6]  Antoni B. Chan,et al.  3D Human Pose Estimation from Monocular Images with Deep Convolutional Neural Network , 2014, ACCV.

[7]  Andrew W. Fitzgibbon,et al.  Efficient regression of general-activity human poses from depth images , 2011, 2011 International Conference on Computer Vision.

[8]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[9]  Yichen Wei,et al.  Towards 3D Human Pose Estimation in the Wild: A Weakly-Supervised Approach , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[10]  Xiaogang Wang,et al.  PointRCNN: 3D Object Proposal Generation and Detection From Point Cloud , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Shu Liu,et al.  IPOD: Intensive Point-based Object Detector for Point Cloud , 2018, ArXiv.

[12]  Matthew Johnson-Roberson,et al.  PedX: Benchmark Dataset for Metric 3-D Pose Estimation of Pedestrians in Complex Urban Intersections , 2018, IEEE Robotics and Automation Letters.

[13]  Christian Szegedy,et al.  DeepPose: Human Pose Estimation via Deep Neural Networks , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[15]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Hans-Peter Seidel,et al.  VNect , 2017, ACM Trans. Graph..

[19]  Cordelia Schmid,et al.  LCR-Net++: Multi-Person 2D and 3D Pose Detection in Natural Images , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Gang Yu,et al.  Cascaded Pyramid Network for Multi-person Pose Estimation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[21]  Jiong Yang,et al.  PointPillars: Fast Encoders for Object Detection From Point Clouds , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Johann Marius Zöllner,et al.  Direct 3D Detection of Vehicles in Monocular Images with a CNN based 3D Decoder , 2019, 2019 IEEE Intelligent Vehicles Symposium (IV).

[23]  Jia Deng,et al.  Stacked Hourglass Networks for Human Pose Estimation , 2016, ECCV.

[24]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Varun Ramakrishna,et al.  Convolutional Pose Machines , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Leonidas J. Guibas,et al.  Frustum PointNets for 3D Object Detection from RGB-D Data , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[27]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[28]  Dacheng Tao,et al.  A Coarse-Fine Network for Keypoint Localization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[29]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Jonathan Tompson,et al.  Towards Accurate Multi-person Pose Estimation in the Wild , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[32]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[33]  James J. Little,et al.  A Simple Yet Effective Baseline for 3d Human Pose Estimation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[34]  Ji Wan,et al.  Multi-view 3D Object Detection Network for Autonomous Driving , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Steven Lake Waslander,et al.  Joint 3D Proposal Generation and Object Detection from View Aggregation , 2017, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[36]  Jonathan Tompson,et al.  Efficient object localization using Convolutional Networks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Dong Liu,et al.  Deep High-Resolution Representation Learning for Human Pose Estimation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Bernt Schiele,et al.  2D Human Pose Estimation: New Benchmark and State of the Art Analysis , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[39]  Steven L. Waslander,et al.  Improving 3D Object Detection for Pedestrians with Virtual Multi-View Synthesis Orientation Estimation , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[40]  Cordelia Schmid,et al.  LCR-Net: Localization-Classification-Regression for Human Pose , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Cewu Lu,et al.  RMPE: Regional Multi-person Pose Estimation , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).