DronePose: Photorealistic UAV-Assistant Dataset Synthesis for 3D Pose Estimation via a Smooth Silhouette Loss

In this work we consider UAVs as cooperative agents supporting human users in their operations. In this context, the 3D localisation of the UAV assistant is an important task that can facilitate the exchange of spatial information between the user and the UAV. To address this in a data-driven manner, we design a data synthesis pipeline to create a realistic multimodal dataset that includes both the exocentric user view, and the egocentric UAV view. We then exploit the joint availability of photorealistic and synthesized inputs to train a single-shot monocular pose estimation model. During training we leverage differentiable rendering to supplement a state-of-the-art direct regression objective with a novel smooth silhouette loss. Our results demonstrate its qualitative and quantitative performance gains over traditional silhouette objectives. Our data and code are available at this https URL

[1]  Sven Behnke,et al.  Refining 6D Object Pose Predictions using Abstract Render-and-Compare , 2019, 2019 IEEE-RAS 19th International Conference on Humanoid Robots (Humanoids).

[2]  Sven Behnke,et al.  RGB-D object recognition and pose estimation based on pre-trained convolutional neural network features , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[3]  Antonio Fernández-Caballero,et al.  Computer Vision in Autonomous Unmanned Aerial Vehicles—A Systematic Mapping Study , 2019, Applied Sciences.

[4]  Devis Tuia,et al.  When a Few Clicks Make All the Difference: Improving Weakly-Supervised Wildlife Detection in UAV Images , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[5]  Dieter Fox,et al.  PoseCNN: A Convolutional Neural Network for 6D Object Pose Estimation in Cluttered Scenes , 2017, Robotics: Science and Systems.

[6]  Vincent Lepetit,et al.  Model Based Training, Detection and Pose Estimation of Texture-Less 3D Objects in Heavily Cluttered Scenes , 2012, ACCV.

[7]  Silvio Savarese,et al.  Forecasting Social Navigation in Crowded Complex Scenes , 2016, ArXiv.

[8]  Petros Daras,et al.  Drone-vs-Bird Detection Challenge at IEEE AVSS2019 , 2019, 2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[9]  Pierre Alliez,et al.  Can semantic labeling methods generalize to any city? the inria aerial image labeling benchmark , 2017, 2017 IEEE International Geoscience and Remote Sensing Symposium (IGARSS).

[10]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Nassir Navab,et al.  SSD-6D: Making RGB-Based 3D Detection and 6D Pose Estimation Great Again , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[12]  Yang Gao,et al.  Deep Learning for Spacecraft Pose Estimation from Photorealistic Rendering , 2019, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[13]  Shaojie Shen,et al.  Real-Time Dense Stereo Embedded in a UAV for Road Inspection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[14]  Slobodan Ilic,et al.  DPOD: 6D Pose Object Detector and Refiner , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[15]  Tae-Kyun Kim,et al.  Recovering 6D Object Pose and Predicting Next-Best-View in the Crowd , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Winston H. Hsu,et al.  Drone-Based Object Counting by Spatially Regularized Regional Proposal Network , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[17]  Matthias Nießner,et al.  Matterport3D: Learning from RGB-D Data in Indoor Environments , 2017, 2017 International Conference on 3D Vision (3DV).

[18]  Larry S. Davis,et al.  AVSS 2011 demo session: A large-scale benchmark dataset for event recognition in surveillance video , 2011, AVSS.

[19]  Pascal Fua,et al.  Real-Time Seamless Single Shot 6D Object Pose Prediction , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[20]  Sertac Karaman,et al.  FlightGoggles: Photorealistic Sensor Simulation for Perception-driven Robotics using Photogrammetry and Virtual Reality , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[21]  Sertac Karaman,et al.  The Blackbird Dataset: A large-scale dataset for UAV perception in aggressive flight , 2018, ISER.

[22]  Leonidas J. Guibas,et al.  Render for CNN: Viewpoint Estimation in Images Using CNNs Trained with Rendered 3D Model Views , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[23]  Mei Wang,et al.  Deep Visual Domain Adaptation: A Survey , 2018, Neurocomputing.

[24]  Bernard Ghanem,et al.  Sim4CV: A Photo-Realistic Simulator for Computer Vision Applications , 2017, International Journal of Computer Vision.

[25]  Yuhua Qi,et al.  Drone Detection and Pose Estimation Using Relational Graph Networks , 2019, Sensors.

[26]  Anoop Cherian,et al.  Unsupervised Joint 3D Object Model Learning and 6D Pose Estimation for Depth-Based Instance Segmentation , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[27]  Hao Li,et al.  Visualizing the Loss Landscape of Neural Nets , 2017, NeurIPS.

[28]  Roland Siegwart,et al.  The EuRoC micro aerial vehicle datasets , 2016, Int. J. Robotics Res..

[29]  Xiangyang Ji,et al.  CDPN: Coordinates-Based Disentangled Pose Network for Real-Time RGB-Based 6-DoF Object Pose Estimation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[30]  Silvio Savarese,et al.  Generalized Intersection Over Union: A Metric and a Loss for Bounding Box Regression , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Peter V. Gehler,et al.  3D2PM - 3D Deformable Part Models , 2012, ECCV.

[32]  Silvio Savarese,et al.  DenseFusion: 6D Object Pose Estimation by Iterative Dense Fusion , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  René Vidal,et al.  3D Pose Regression Using Convolutional Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[34]  Eric Brachmann,et al.  Uncertainty-Driven 6D Pose Estimation of Objects and Scenes from a Single RGB Image , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Andrea Palazzi,et al.  End-to-End 6-DoF Object Pose Estimation Through Differentiable Rasterization , 2018, ECCV Workshops.

[36]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[37]  P. Fua,et al.  Detecting Flying Objects Using a Single Moving Camera , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Bernard Ghanem,et al.  A Benchmark and Simulator for UAV Tracking , 2016, ECCV.

[39]  Dieter Schmalstieg,et al.  Drone-Augmented Human Vision: Exocentric Control for Drones Exploring Hidden Areas , 2018, IEEE Transactions on Visualization and Computer Graphics.

[40]  Andrew C. Jones,et al.  Fully automatic ID mattes with support for motion blur and transparency , 2015, SIGGRAPH Posters.

[41]  Leonidas J. Guibas,et al.  Normalized Object Coordinate Space for Category-Level 6D Object Pose and Size Estimation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Jiming Chen,et al.  Anti-Drone System with Multiple Surveillance Technologies: Architecture, Implementation, and Challenges , 2018, IEEE Communications Magazine.

[43]  Mathieu Aubry,et al.  Convolutional Neural Networks for joint object detection and pose estimation: A comparative study , 2014, ArXiv.

[44]  Vincent Lepetit,et al.  BB8: A Scalable, Accurate, Robust to Partial Occlusion Method for Predicting the 3D Poses of Challenging Objects without Using Depth , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[45]  Yi Zhou,et al.  On the Continuity of Rotation Representations in Neural Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Jaakko Lehtinen,et al.  Learning to Predict 3D Objects with an Interpolation-based Differentiable Renderer , 2019, NeurIPS.

[47]  Hujun Bao,et al.  PVNet: Pixel-Wise Voting Network for 6DoF Pose Estimation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Theocharis Theocharides,et al.  Deep-Learning-Based Aerial Image Classification for Emergency Response Applications Using Unmanned Aerial Vehicles , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[49]  Tae-Kyun Kim,et al.  Latent-Class Hough Forests for 6 DoF Object Pose Estimation , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[50]  Michael J. Black,et al.  OpenDR: An Approximate Differentiable Renderer , 2014, ECCV.

[51]  Cees Snoek,et al.  Spherical Regression: Learning Viewpoints, Surface Normals and 3D Rotations on N-Spheres , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[53]  Xin Zhang,et al.  A study of the effect of the illumination model on the generation of synthetic training datasets , 2020, ArXiv.

[54]  Jitendra Malik,et al.  Inferring 3D Object Pose in RGB-D Images , 2015, ArXiv.

[55]  Dario Izzo,et al.  Satellite Pose Estimation Challenge: Dataset, Competition Design, and Results , 2019, IEEE Transactions on Aerospace and Electronic Systems.

[56]  Ashish Kapoor,et al.  AirSim: High-Fidelity Visual and Physical Simulation for Autonomous Vehicles , 2017, FSR.

[57]  Christopher Joseph Pal,et al.  The Importance of Skip Connections in Biomedical Image Segmentation , 2016, LABELS/DLMIA@MICCAI.

[58]  Qinghua Hu,et al.  Vision Meets Drones: A Challenge , 2018, ArXiv.

[59]  Cemal Aker,et al.  Using deep networks for drone detection , 2017, 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[60]  Sven Kosub,et al.  A note on the triangle inequality for the Jaccard distance , 2016, Pattern Recognit. Lett..

[61]  Wei Sun,et al.  PVN3D: A Deep Point-Wise 3D Keypoints Voting Network for 6DoF Pose Estimation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[62]  Sepp Hochreiter,et al.  Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.

[63]  Dani Lischinski,et al.  Creating and chaining camera moves for quadrotor videography , 2018, ACM Trans. Graph..

[64]  Eric Brachmann,et al.  Learning 6D Object Pose Estimation Using 3D Object Coordinates , 2014, ECCV.

[65]  Alexey Dosovitskiy,et al.  Unsupervised Learning of Shape and Pose with Differentiable Point Clouds , 2018, NeurIPS.

[66]  Dieter Fox,et al.  LatentFusion: End-to-End Differentiable Reconstruction and Rendering for Unseen Object Pose Estimation , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[67]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[68]  Giuseppe Loianno,et al.  Human Gaze-Driven Spatial Tasking of an Autonomous MAV , 2019, IEEE Robotics and Automation Letters.

[69]  Sanja Fidler,et al.  Kaolin: A PyTorch Library for Accelerating 3D Deep Learning Research , 2019, ArXiv.

[70]  Alain Pagani,et al.  Learning 6DoF Object Poses from Synthetic Single Channel Images , 2018, 2018 IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct).

[71]  Jitendra Malik,et al.  Viewpoints and keypoints , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[72]  Ashish Kapoor,et al.  AirSim-W: A Simulation Environment for Wildlife Conservation with UAVs , 2018, COMPASS.

[73]  Marc Van Droogenbroeck,et al.  Mid-Air: A Multi-Modal Dataset for Extremely Low Altitude Drone Flights , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).