Fast and Robust Multi-Person 3D Pose Estimation From Multiple Views

This paper addresses the problem of 3D pose estimation for multiple people in a few calibrated camera views. The main challenge of this problem is to find the cross-view correspondences among noisy and incomplete 2D pose predictions. Most previous methods address this challenge by directly reasoning in 3D using a pictorial structure model, which is inefficient due to the huge state space. We propose a fast and robust approach to solve this problem. Our key idea is to use a multi-way matching algorithm to cluster the detected 2D poses in all views. Each resulting cluster encodes 2D poses of the same person across different views and consistent correspondences across the keypoints, from which the 3D pose of each person can be effectively inferred. The proposed convex optimization based multi-way matching algorithm is efficient and robust against missing and false detections, without knowing the number of people in the scene. Moreover, we propose to combine geometric and appearance cues for cross-view matching. The proposed approach achieves significant performance gains from the state-of-the-art (96.3% vs. 90.6% and 96.9% vs. 88% on the Campus and Shelf datasets, respectively), while being efficient for real-time applications.

[1]  Deva Ramanan,et al.  3D Human Pose Estimation = 2D Pose Estimation + Matching , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Luc Van Gool,et al.  Learning Probabilistic Non-Linear Latent Variable Models for Tracking Complex Activities , 2011, NIPS.

[3]  David J. Fleet,et al.  Dynamical binary latent variable models for 3D human pose tracking , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[4]  Stefan Carlsson,et al.  3D Pictorial Structures for Multiple View Articulated Pose Estimation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Leonidas J. Guibas,et al.  Consistent Shape Maps via Semidefinite Programming , 2013, SGP '13.

[6]  Gang Yu,et al.  Cascaded Pyramid Network for Multi-person Pose Estimation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[7]  Yichen Wei,et al.  Compositional Human Pose Regression , 2018, Comput. Vis. Image Underst..

[8]  Lourdes Agapito,et al.  Lifting from the Deep: Convolutional 3D Pose Estimation from a Single Image , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Jonathan Tompson,et al.  Efficient ConvNet-based marker-less motion capture in general scenes with a low number of cameras , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[11]  Emre Akbas,et al.  MultiPoseNet: Fast Multi-Person Pose Estimation using Pose Residual Network , 2018, ECCV.

[12]  Nicolas Padoy,et al.  A generalizable approach for multi-view 3D human pose regression , 2018, Machine Vision and Applications.

[13]  Christian Szegedy,et al.  DeepPose: Human Pose Estimation via Deep Neural Networks , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Nassir Navab,et al.  3D Pictorial Structures for Multiple Human Pose Estimation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[16]  Kaiming He,et al.  Mask R-CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[17]  Pascal Fua,et al.  Learning to Fuse 2D and 3D Image Cues for Monocular Body Pose Estimation , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[18]  Francesc Moreno-Noguer,et al.  3D Human Pose Estimation from a Single Image via Distance Matrix Regression , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Varun Ramakrishna,et al.  Convolutional Pose Machines , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Dacheng Tao,et al.  A Coarse-Fine Network for Keypoint Localization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[21]  James J. Little,et al.  A Simple Yet Effective Baseline for 3d Human Pose Estimation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[22]  Shohreh Kasaei,et al.  Multiple human 3D pose estimation from multiview images , 2017, Multimedia Tools and Applications.

[23]  Nassir Navab,et al.  Multiple Human Pose Estimation with Temporally Consistent 3D Pictorial Structures , 2014, ECCV Workshops.

[24]  Jia Deng,et al.  Stacked Hourglass Networks for Human Pose Estimation , 2016, ECCV.

[25]  Xiaowei Zhou,et al.  Sparseness Meets Deepness: 3D Human Pose Estimation from Monocular Video , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Peter V. Gehler,et al.  DeepCut: Joint Subset Partition and Labeling for Multi Person Pose Estimation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Takeo Kanade,et al.  Panoptic Studio: A Massively Multiview System for Social Motion Capture , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[28]  Michael Isard,et al.  Loose-limbed People: Estimating 3D Human Pose and Motion Using Non-parametric Belief Propagation , 2011, International Journal of Computer Vision.

[29]  Yichen Wei,et al.  Towards 3D Human Pose Estimation in the Wild: A Weakly-Supervised Approach , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[30]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Xiaowei Zhou,et al.  Harvesting Multiple Views for Marker-Less 3D Human Pose Annotations , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Zhiao Huang,et al.  Associative Embedding: End-to-End Learning for Joint Detection and Grouping , 2016, NIPS.

[33]  Yaser Sheikh,et al.  Total Capture: A 3D Deformation Model for Tracking Faces, Hands, and Bodies , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[34]  Xiaowei Zhou,et al.  Ordinal Depth Supervision for 3D Human Pose Estimation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[35]  Xiaowei Zhou,et al.  Learning to Estimate 3D Human Pose and Shape from a Single Color Image , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[36]  Emmanuel J. Candès,et al.  A Singular Value Thresholding Algorithm for Matrix Completion , 2008, SIAM J. Optim..

[37]  Xiaowei Zhou,et al.  Multi-image Matching via Fast Alternating Minimization , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[38]  Yi Yang,et al.  Camera Style Adaptation for Person Re-identification , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[39]  Bernt Schiele,et al.  DeeperCut: A Deeper, Stronger, and Faster Multi-person Pose Estimation Model , 2016, ECCV.

[40]  Cewu Lu,et al.  RMPE: Regional Multi-person Pose Estimation , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[41]  Cristian Sminchisescu,et al.  Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  Andrew Zisserman,et al.  Flowing ConvNets for Human Pose Estimation in Videos , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[43]  Takeo Kanade,et al.  Panoptic Studio: A Massively Multiview System for Social Interaction Capture , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  Peter V. Gehler,et al.  Keep It SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image , 2016, ECCV.

[45]  Jitendra Malik,et al.  End-to-End Recovery of Human Shape and Pose , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.