Multi-Person 3D Pose Estimation and Tracking in Sports

We present an approach to multi-person 3D pose estimation and tracking from multi-view video. Following independent 2D pose detection in each view, we: (1) correct errors in the output of the pose detector; (2) apply a fast greedy algorithm for associating 2D pose detections between camera views; and (3) use the associated poses to generate and track 3D skeletons. Previous methods for estimating skeletons of multiple people suffer long processing times or rely on appearance cues, reducing their applicability to sports. Our approach to associating poses between views works by seeking the best correspondences first in a greedy fashion, while reasoning about the cyclic nature of correspondences to constrain the search. The associated poses can be used to generate 3D skeletons, which we produce via robust triangulation. Our method can track 3D skeletons in the presence of missing detections, substantial occlusions, and large calibration error. We believe ours is the first method for full-body 3D pose estimation and tracking of multiple players in highly dynamic sports scenes. The proposed method achieves a significant improvement in speed over state-of-the-art methods.

[1]  Hujun Bao,et al.  Fast and Robust Multi-Person 3D Pose Estimation From Multiple Views , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  J. Collomosse,et al.  Real-Time Full-Body Motion Capture from Video and IMUs , 2017, 2017 International Conference on 3D Vision (3DV).

[3]  Deva Ramanan,et al.  3D Human Pose Estimation = 2D Pose Estimation + Matching , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Bernt Schiele,et al.  Monocular 3D pose estimation and tracking by detection , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[5]  Michael Isard,et al.  Loose-limbed People: Estimating 3D Human Pose and Motion Using Non-parametric Belief Propagation , 2011, International Journal of Computer Vision.

[6]  Hans-Peter Seidel,et al.  VNect , 2017, ACM Trans. Graph..

[7]  Varun Ramakrishna,et al.  Pose Machines: Articulated Pose Estimation via Inference Machines , 2014, ECCV.

[8]  Peter V. Gehler,et al.  DeepCut: Joint Subset Partition and Labeling for Multi Person Pose Estimation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Shohreh Kasaei,et al.  Multiple human 3D pose estimation from multiview images , 2017, Multimedia Tools and Applications.

[10]  Nassir Navab,et al.  3D Pictorial Structures for Multiple Human Pose Estimation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Bodo Rosenhahn,et al.  Supplementary Material to: Recovering Accurate 3D Human Pose in The Wild Using IMUs and a Moving Camera , 2018 .

[12]  Bernt Schiele,et al.  Multi-view Pictorial Structures for 3D Human Pose Estimation , 2013, BMVC.

[13]  Jean-Yves Guillemaut,et al.  Joint Multi-Layer Segmentation and Reconstruction for Free-Viewpoint Video Applications , 2011, International Journal of Computer Vision.

[14]  Stefan Carlsson,et al.  3D Pictorial Structures for Multiple View Articulated Pose Estimation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Pascal Fua,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 Multiple Object Tracking Using K-shortest Paths Optimization , 2022 .

[16]  Peter V. Gehler,et al.  Poselet Conditioned Pictorial Structures , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Takeo Kanade,et al.  Panoptic Studio: A Massively Multiview System for Social Interaction Capture , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Adrian Hilton,et al.  Simultaneous Pose Estimation of Multiple People using Multiple-View Cues with Hierarchical Sampling , 2003, BMVC.

[19]  Stefan Roth,et al.  MOT16: A Benchmark for Multi-Object Tracking , 2016, ArXiv.

[20]  Varun Ramakrishna,et al.  Convolutional Pose Machines , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Hans-Peter Seidel,et al.  Fast articulated motion tracking using a sums of Gaussians body model , 2011, 2011 International Conference on Computer Vision.

[22]  Daniel P. Huttenlocher,et al.  Pictorial Structures for Object Recognition , 2004, International Journal of Computer Vision.

[23]  Lourdes Agapito,et al.  Lifting from the Deep: Convolutional 3D Pose Estimation from a Single Image , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Jonathan Tompson,et al.  Efficient ConvNet-based marker-less motion capture in general scenes with a low number of cameras , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Mubarak Shah,et al.  (MP)2T: Multiple People Multiple Parts Tracker , 2012, ECCV.

[27]  Nassir Navab,et al.  3D Pictorial Structures Revisited: Multiple Human Pose Estimation , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Andrew Zisserman,et al.  Flowing ConvNets for Human Pose Estimation in Videos , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[29]  Bernt Schiele,et al.  PoseTrack: A Benchmark for Human Pose Estimation and Tracking , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[30]  Bernt Schiele,et al.  DeeperCut: A Deeper, Stronger, and Faster Multi-person Pose Estimation Model , 2016, ECCV.

[31]  Peter V. Gehler,et al.  Keep It SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image , 2016, ECCV.

[32]  Christian Szegedy,et al.  DeepPose: Human Pose Estimation via Deep Neural Networks , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  Luc Van Gool,et al.  Thin-Slicing Network: A Deep Structured Model for Pose Estimation in Videos , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Stefan Roth,et al.  People-tracking-by-detection and people-detection-by-tracking , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Emre Akbas,et al.  MultiPoseNet: Fast Multi-Person Pose Estimation using Pose Residual Network , 2018, ECCV.

[36]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[37]  Cordelia Schmid,et al.  Learning from Synthetic Humans , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).