Monocular 3D Object Detection in Cylindrical Images from Fisheye Cameras

Detecting objects in 3D from a monocular camera has been successfully demonstrated using various methods based on convolutional neural networks. These methods have been demonstrated on rectilinear perspective images equivalent to being taken by a pinhole camera, whose geometry is explicitly or implicitly exploited. Such methods fail in images with alternative projections, such as those acquired by fisheye cameras, even when provided with a labeled training set of fisheye images and 3D bounding boxes. In this work, we show how to adapt existing 3D object detection methods to images from fisheye cameras, including in the case that no labeled fisheye data is available for training. We significantly outperform existing art on a benchmark of synthetic data, and we also experiment with an internal dataset of real fisheye images.

[1]  Roberto Cipolla,et al.  Multi-task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[2]  Noah Snavely,et al.  Unsupervised Learning of Depth and Ego-Motion from Video , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Jason Yosinski,et al.  An Intriguing Failing of Convolutional Neural Networks and the CoordConv Solution , 2018, NeurIPS.

[4]  Sanja Fidler,et al.  Monocular 3D Object Detection for Autonomous Driving , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Kristen Grauman,et al.  Flat2Sphere: Learning Spherical Convolution for Fast Features from 360° Imagery , 2017, NIPS 2017.

[6]  Qiang Xu,et al.  nuScenes: A Multimodal Dataset for Autonomous Driving , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Petros Daras,et al.  OmniDepth: Dense Depth Estimation for Indoors Spherical Panoramas , 2018, ECCV.

[8]  Shunping Ji,et al.  Grid Based Spherical CNN for Object Detection from Panoramic Images , 2019, Sensors.

[9]  Oisin Mac Aodha,et al.  Unsupervised Monocular Depth Estimation with Left-Right Consistency , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Massimo Bertozzi,et al.  360° Detection and tracking algorithm of both pedestrian and vehicle using fisheye images , 2015, 2015 IEEE Intelligent Vehicles Symposium (IV).

[11]  Husam Abu-Haimed,et al.  Learning Object-Specific Distance From a Monocular Image , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[12]  Andreas Geiger,et al.  SphereNet: Learning Spherical Representations for Detection and Classification in Omnidirectional Images , 2018, ECCV.

[13]  Andrea Simonelli,et al.  Disentangling Monocular 3D Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[14]  Toby P. Breckon,et al.  Eliminating the Blind Spot: Adapting 3D Object Detection and Monocular Depth Estimation to 360° Panoramic Imagery , 2018, ECCV.

[15]  Senthil Yogamani,et al.  Monocular Fisheye Camera Depth Estimation Using Sparse LiDAR Supervision , 2018, 2018 21st International Conference on Intelligent Transportation Systems (ITSC).

[16]  Patrick Mäder,et al.  FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[17]  Petros Daras,et al.  Spherical View Synthesis for Self-Supervised 360° Depth Estimation , 2019, 2019 International Conference on 3D Vision (3DV).

[18]  Geng Yan,et al.  Real-time Detection, Tracking, and Classification of Moving and Stationary Objects using Multiple Fisheye Images , 2018, 2018 IEEE Intelligent Vehicles Symposium (IV).

[19]  Jana Kosecka,et al.  3D Bounding Box Estimation Using Deep Learning and Geometry , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Martin Stellmacher,et al.  Optimization of a CNN-based Object Detector for Fisheye Cameras , 2019, 2019 IEEE International Conference of Vehicular Electronics and Safety (ICVES).

[21]  Ethan Fetaya,et al.  Real-Time Category-Based and General Obstacle Detection for Autonomous Driving , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[22]  Jonathan Ventura,et al.  Unsupervised Learning of Depth and Ego-Motion from Cylindrical Panoramic Video with Applications for Virtual Reality , 2020, Int. J. Semantic Comput..

[23]  Xiaorui Zhu,et al.  Object detection on panoramic images based on deep learning , 2017, 2017 3rd International Conference on Control, Automation and Robotics (ICCAR).

[24]  Rob Fergus,et al.  Depth Map Prediction from a Single Image using a Multi-Scale Deep Network , 2014, NIPS.

[25]  Nassir Navab,et al.  Distortion-Aware Convolutional Filters for Dense Prediction in Panoramic Images , 2018, ECCV.

[26]  Nikos Komodakis,et al.  Unsupervised Representation Learning by Predicting Image Rotations , 2018, ICLR.

[27]  Jelena Novosel,et al.  RotInvMTL: Rotation Invariant MultiNet on Fisheye Images for Autonomous Driving Applications , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[28]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Kris Kitani,et al.  Monocular 3D Object Detection with Pseudo-LiDAR Point Cloud , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[30]  Alexei A. Efros,et al.  Unsupervised Domain Adaptation through Self-Supervision , 2019, ArXiv.

[31]  Bin Xu,et al.  Multi-level Fusion Based 3D Object Detection from Monocular Images , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[32]  Yan Wang,et al.  Pseudo-LiDAR From Visual Depth Estimation: Bridging the Gap in 3D Object Detection for Autonomous Driving , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Muhammad Abdul Haseeb,et al.  DisNet: A novel method for distance estimation from monocular camera , 2018 .

[35]  Kurt Keutzer,et al.  Algorithm-hardware Co-design for Deformable Convolution , 2020, ArXiv.

[36]  Adrien Gaidon,et al.  ROI-10D: Monocular Lifting of 2D Detection to 6D Pose and Metric Shape , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Rares Ambrus,et al.  SuperDepth: Self-Supervised, Super-Resolved Monocular Depth Estimation , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[38]  Kaigui Bian,et al.  Reprojection R-CNN: A Fast and Accurate Object Detector for 360° Images , 2019, ArXiv.

[39]  Jong Bin Ryu,et al.  SweepNet: Wide-baseline Omnidirectional Depth Estimation , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[40]  Anelia Angelova,et al.  Unsupervised Learning of Depth and Ego-Motion from Monocular Video Using 3D Geometric Constraints , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[41]  Gabriel J. Brostow,et al.  Digging Into Self-Supervised Monocular Depth Estimation , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).