ActiveMoCap: Optimized Drone Flight for Active Human Motion Capture

The accuracy of monocular 3D human pose estimation depends on the viewpoint from which the image is captured. While camera-equipped drones provide control over this viewpoint, automatically positioning them at the location which will yield the highest accuracy remains an open problem. This is the problem that we address in this paper. Specifically, given a short video sequence, we introduce an algorithm that predicts the where a drone should go in the future frame so as to maximize 3D human pose estimation accuracy. A key idea underlying our approach is a method to estimate the uncertainty of the 3D body pose estimates. We integrate several sources of uncertainty, originating from a deep learning based regressors and temporal smoothness. The resulting motion planner leads to improved 3D body pose estimates and outperforms or matches existing planners that are based on person following and orbiting.

[1]  Michael J. Black,et al.  Markerless Outdoor Human Motion Capture Using Multiple Autonomous Micro Aerial Vehicles , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[2]  Cristian Sminchisescu,et al.  Monocular 3D Pose and Shape Estimation of Multiple People in Natural Scenes: The Importance of Multiple Scene Constraints , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[3]  Davide Scaramuzza,et al.  An information gain formulation for active volumetric 3D reconstruction , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[4]  Matthias Nießner,et al.  Plan3D , 2017, ACM Trans. Graph..

[5]  Xiaowei Zhou,et al.  Coarse-to-Fine Volumetric Prediction for Single-Image 3D Human Pose , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Alexander Domahidi,et al.  Real-time planning for automated multi-view drone cinematography , 2017, ACM Trans. Graph..

[7]  Vijay Kumar,et al.  Human Motion Capture Using a Drone , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[8]  Dario Pavllo,et al.  3D Human Pose Estimation in Video With Temporal Convolutions and Semi-Supervised Training , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Mark E. Campbell,et al.  An Adaptable, Probabilistic, Next-Best View Algorithm for Reconstruction of Unknown 3-D Objects , 2017, IEEE Robotics and Automation Letters.

[11]  Sebastian Nowozin,et al.  Deep Directional Statistics: Pose Estimation with Uncertainty Quantification , 2018, ECCV.

[12]  Pascal Fua,et al.  Learning to Fuse 2D and 3D Image Cues for Monocular Body Pose Estimation , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[13]  Jitendra Malik,et al.  End-to-End Recovery of Human Shape and Pose , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[14]  Hans-Peter Seidel,et al.  EgoCap , 2016, ACM Trans. Graph..

[15]  Xiaowei Zhou,et al.  Harvesting Multiple Views for Marker-Less 3D Human Pose Annotations , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Jitendra Malik,et al.  Predicting 3D Human Dynamics From Video , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[17]  Cristian Sminchisescu,et al.  Domes to Drones: Self-Supervised Active Triangulation for 3D Human Pose Reconstruction , 2019, NeurIPS.

[18]  Lourdes Agapito,et al.  Lifting from the Deep: Convolutional 3D Pose Estimation from a Single Image , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Yaser Sheikh,et al.  Monocular Total Capture: Posing Face, Body, and Hands in the Wild , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Cyrill Stachniss,et al.  Information-Driven Autonomous Exploration for a Vision-Based Mav , 2017 .

[21]  Lu Fang,et al.  iHuman3D: Intelligent Human Body 3D Reconstruction using a Single Flying Camera , 2018, ACM Multimedia.

[22]  James Davis,et al.  Camera Placement Considering Occlusion for Robust Motion Capture , 2000 .

[23]  Abdelmalik Taleb-Ahmed,et al.  Designing a camera placement assistance system for human motion capture based on a guided genetic algorithm , 2018, Virtual Reality.

[24]  Yang Liu,et al.  Multi-view People Tracking via Hierarchical Trajectory Composition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Ashish Kapoor,et al.  Learn-to-Score: Efficient 3D Scene Exploration by Predicting View Utility , 2018, ECCV.

[26]  Pascal Fua,et al.  Monocular 3D Human Pose Estimation in the Wild Using Improved CNN Supervision , 2016, 2017 International Conference on 3D Vision (3DV).

[27]  Heinrich H. Bülthoff,et al.  Active Perception Based Formation Control for Multiple Aerial Vehicles , 2019, IEEE Robotics and Automation Letters.

[28]  Ali Farhadi,et al.  YOLOv3: An Incremental Improvement , 2018, ArXiv.

[29]  Javier Alonso-Mora,et al.  Flycon: real-time environment-independent multi-view human pose estimation with aerial vehicles , 2019, ACM Trans. Graph..

[30]  Gireeja Ranade,et al.  Learning to gather information via imitation , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[31]  Cristian Sminchisescu,et al.  Deep Multitask Architecture for Integrated 2D and 3D Human Sensing , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Qionghai Dai,et al.  FlyCap: Markerless Motion Capture Using Multiple Autonomous Flying Cameras , 2016, IEEE Transactions on Visualization and Computer Graphics.

[33]  Andrew W. Fitzgibbon,et al.  Online generative model personalization for hand tracking , 2017, ACM Trans. Graph..

[34]  Scott Cohen,et al.  Forecasting Human Dynamics from Static Images , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  HilligesOtmar,et al.  Optimizing for aesthetically pleasing quadrotor camera motion , 2018 .

[36]  Joseph K. Kearney,et al.  Optimal Camera Placement for Motion Capture Systems , 2017, IEEE Transactions on Visualization and Computer Graphics.

[37]  Yichen Wei,et al.  Compositional Human Pose Regression , 2018, Comput. Vis. Image Underst..

[38]  Olivier Stasse,et al.  MonoSLAM: Real-Time Single Camera SLAM , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  Jitendra Malik,et al.  Recurrent Network Models for Human Dynamics , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[40]  Pat Hanrahan,et al.  Submodular Trajectory Optimization for Aerial 3D Scanning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[41]  James J. Little,et al.  A Simple Yet Effective Baseline for 3d Human Pose Estimation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[42]  Cordelia Schmid,et al.  LCR-Net: Localization-Classification-Regression for Human Pose , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Yichen Wei,et al.  Weakly-supervised Transfer for 3D Human Pose Estimation in the Wild , 2017, ArXiv.

[44]  Ashish Kapoor,et al.  AirSim: High-Fidelity Visual and Physical Simulation for Autonomous Vehicles , 2017, FSR.