SmartMocap: Joint Estimation of Human and Camera Motion Using Uncalibrated RGB Cameras

Markerless human motion capture (mocap) from multiple RGB cameras is a widely studied problem. Existing methods either need calibrated cameras or calibrate them relative to a static camera, which acts as the reference frame for the mocap system. The calibration step has to be done a priori for every capture session, which is a tedious process, and re-calibration is required whenever cameras are intentionally or accidentally moved. In this paper, we propose a mocap method which uses multiple static and moving extrinsically uncalibrated RGB cameras. The key components of our method are as follows. First, since the cameras and the subject can move freely, we select the ground plane as a common reference to represent both the body and the camera motions unlike existing methods which represent bodies in the camera coordinate. Second, we learn a probability distribution of short human motion sequences ($\sim$1sec) relative to the ground plane and leverage it to disambiguate between the camera and human motion. Third, we use this distribution as a motion prior in a novel multi-stage optimization approach to fit the SMPL human body model and the camera poses to the human body keypoints on the images. Finally, we show that our method can work on a variety of datasets ranging from aerial cameras to smartphones. It also gives more accurate results compared to the state-of-the-art on the task of monocular human mocap with a static camera. Our code is available for research purposes on https://github.com/robot-perception-group/SmartMocap.

[1]  Michael J. Black,et al.  Capturing and Inferring Dense Full-Body Human-Scene Contact , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Stefan Leutenegger,et al.  BodySLAM: Joint Camera Localisation, Mapping, and Human Motion Tracking , 2022, ECCV.

[3]  Michael J. Black,et al.  AirPose: Multi-View Fusion Network for Aerial 3D Human Pose and Shape Estimation , 2022, IEEE Robotics and Automation Letters.

[4]  J. Kautz,et al.  GLAMR: Global Occlusion-Aware Human Mesh Recovery with Dynamic Cameras , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Buzhen Huang,et al.  Dynamic Multi-Person Mesh Recovery From Uncalibrated Multi-View Cameras , 2021, 2021 International Conference on 3D Vision (3DV).

[6]  Leonidas J. Guibas,et al.  HuMoR: 3D Human Motion Model for Robust Pose Estimation , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[7]  Michael J. Black,et al.  PARE: Part Attention Regressor for 3D Human Body Estimation , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[8]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Lior Fritz,et al.  Beyond Weak Perspective for Monocular 3D Human Pose Estimation , 2020, ECCV Workshops.

[10]  Michael J. Black,et al.  AirCapRL: Autonomous Aerial Human Motion Capture Using Deep Reinforcement Learning , 2020, IEEE Robotics and Automation Letters.

[11]  Michael J. Black,et al.  Markerless Outdoor Human Motion Capture Using Multiple Autonomous Micro Aerial Vehicles , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[12]  Pascal Fua,et al.  Motion Capture from Pan-Tilt Cameras with Unknown Orientation , 2019, 2019 International Conference on 3D Vision (3DV).

[13]  Victor Lempitsky,et al.  Learnable Triangulation of Human Pose , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[14]  Dimitrios Tzionas,et al.  Expressive Body Capture: 3D Hands, Face, and Body From a Single Image , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Nikolaus F. Troje,et al.  AMASS: Archive of Motion Capture As Surface Shapes , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[16]  Xiaodong Liu,et al.  Cyclical Annealing Schedule: A Simple Approach to Mitigating KL Vanishing , 2019, NAACL.

[17]  Yi Zhou,et al.  On the Continuity of Rotation Representations in Neural Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Mamoru Sato ’Scape , 2017, Encyclopedia of Food and Agricultural Ethics.

[19]  Hideaki Kimata,et al.  Human Pose as Calibration Pattern: 3D Human Pose Estimation with Multiple Unsynchronized and Uncalibrated Cameras , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[20]  Jitendra Malik,et al.  End-to-End Recovery of Human Shape and Pose , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[21]  Yinghao Huang,et al.  Towards Accurate Marker-Less Human Shape and Pose Estimation over Time , 2017, 2017 International Conference on 3D Vision (3DV).

[22]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Michael J. Black,et al.  SMPL: A Skinned Multi-Person Linear Model , 2023 .

[24]  Kwang In Kim,et al.  Outdoor Human Motion Capture by Simultaneous Optimization of Pose and Camera Parameters , 2015, Comput. Graph. Forum.

[25]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[26]  Michael J. Black,et al.  Estimating human shape and pose from a single image , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[27]  Hans-Peter Seidel,et al.  Markerless Motion Capture with unsynchronized moving cameras , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.