Near-Field Perception for Low-Speed Vehicle Automation Using Surround-View Fisheye Cameras

Cameras are the primary sensor in automated driving systems. They provide high information density and are optimal for detecting road infrastructure cues laid out for human vision. Surround-view camera systems typically comprise of four fisheye cameras with 190°+ field of view covering the entire 360° around the vehicle focused on near-field sensing. They are the principal sensors for low-speed, high accuracy, and closerange sensing applications, such as automated parking, traffic jam assistance, and low-speed emergency braking. In this work, we provide a detailed survey of such vision systems, setting up the survey in the context of an architecture that can be decomposed into four modular components namely Recognition, Reconstruction, Relocalization, and Reorganization. We jointly call this the 4R Architecture. We discuss how each component accomplishes a specific aspect and provide a positional argument that they can be synergized to form a complete perception system for low-speed automation. We support this argument by presenting results from previous works and by presenting architecture proposals for such a system. Qualitative results are presented in the video at https://youtu.be/ae8bCOF77uY.

[1]  Dacheng Tao,et al.  Deep Ordinal Regression Network for Monocular Depth Estimation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[2]  Senthil Yogamani,et al.  OmniDet: Surround View Cameras Based Multi-Task Visual Perception Network for Autonomous Driving , 2021, IEEE Robotics and Automation Letters.

[3]  Jörg Stückler,et al.  Keyframe-based visual-inertial online SLAM with relocalization , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[4]  C. Sunstein,et al.  Rear Visibility and Some Unresolved Problems for Economic Analysis (With Notes on Experience Goods) , 2019, Journal of Benefit-Cost Analysis.

[5]  Wei Xu,et al.  Every Pixel Counts ++: Joint Learning of Geometry and Motion with 3D Holistic Understanding , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Ciaran Eising,et al.  Spherical Formulation of Geometric Motion Segmentation Constraints in Fisheye Cameras , 2021, IEEE Transactions on Intelligent Transportation Systems.

[7]  Ruigang Yang,et al.  GA-Net: Guided Aggregation Net for End-To-End Stereo Matching , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Eugenio Culurciello,et al.  ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation , 2016, ArXiv.

[9]  Walterio W. Mayol-Cuevas,et al.  Enhancing 6D visual relocalisation with depth cameras , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[10]  Zhangjing Wang,et al.  Multi-Sensor Fusion in Automated Driving: A Survey , 2020, IEEE Access.

[11]  Senthil Yogamani,et al.  Design of Real-time Semantic Segmentation Decoder for Automated Driving , 2019, VISIGRAPP.

[12]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Javier Ibanez Guzman,et al.  Estimating localization uncertainty using multi-hypothesis map-matching on high-definition road maps , 2017, 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC).

[14]  Francisco Herrera,et al.  Deep Learning in Video Multi-Object Tracking: A Survey , 2019, Neurocomputing.

[15]  Jan-Michael Frahm,et al.  Real-Time Plane-Sweeping Stereo with Multiple Sweeping Directions , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Tom van Dijk,et al.  How Do Neural Networks See Depth in Single Images? , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[17]  Markus Maurer,et al.  Stadtpilot: First fully autonomous test drives in urban traffic , 2011, 2011 14th International IEEE Conference on Intelligent Transportation Systems (ITSC).

[18]  Vincent Lepetit,et al.  Back to the Feature: Learning Robust Camera Localization from Pixels to Pose , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Senthil Yogamani,et al.  Monocular Fisheye Camera Depth Estimation Using Sparse LiDAR Supervision , 2018, 2018 21st International Conference on Intelligent Transportation Systems (ITSC).

[20]  Raphaël Troncy,et al.  Modeling dangerous driving events based on in-vehicle data using Random Forest and Recurrent Neural Network , 2019, 2019 IEEE Intelligent Vehicles Symposium (IV).

[21]  Rita Cucchiara,et al.  Embedded recurrent network for head pose estimation in car , 2017, 2017 IEEE Intelligent Vehicles Symposium (IV).

[22]  Peter Walzer,et al.  IRVW Futura The Volkswagen Research Car , 1990 .

[23]  Jens Klappstein,et al.  Detectability of Moving Objects Using Correspondences over Two and Three Frames , 2007, DAGM-Symposium.

[24]  Andreas Geiger,et al.  SphereNet: Learning Spherical Representations for Detection and Classification in Omnidirectional Images , 2018, ECCV.

[25]  Ashutosh Saxena,et al.  3-D Depth Reconstruction from a Single Still Image , 2007, International Journal of Computer Vision.

[26]  Takeshi Oishi,et al.  Real-Time Dense Depth Estimation Using Semantically-Guided LIDAR Data Propagation and Motion Stereo , 2019, IEEE Robotics and Automation Letters.

[27]  Victor Vaquero,et al.  FuseMODNet: Real-Time Camera and LiDAR Based Moving Object Detection for Robust Low-Light Autonomous Driving , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[28]  Mennatullah Siam,et al.  InstanceMotSeg: Real-time Instance Motion Segmentation for Autonomous Driving , 2020, ArXiv.

[29]  Senthil Yogamani,et al.  NeurAll: Towards a Unified Visual Perception Model for Automated Driving , 2019, 2019 IEEE Intelligent Transportation Systems Conference (ITSC).

[30]  Senthil Yogamani,et al.  Capsule Neural Network based Height Classification using Low-Cost Automotive Ultrasonic Sensors , 2019, 2019 IEEE Intelligent Vehicles Symposium (IV).

[31]  Daxin Tian,et al.  Deep Learning for Image and Point Cloud Fusion in Autonomous Driving: A Review , 2021, IEEE Transactions on Intelligent Transportation Systems.

[32]  Richard Szeliski,et al.  A multi-view approach to motion and stereo , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[33]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Rob Fergus,et al.  Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-scale Convolutional Architecture , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[35]  GlavinMartin,et al.  Equidistant (fθ) fish-eye perspective with application in distortion centre estimation , 2010 .

[36]  W E Grimson,et al.  A computer implementation of a theory of human stereo vision. , 1981, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[37]  David Hurych,et al.  Desoiling Dataset: Restoring Soiled Areas on Automotive Fisheye Cameras , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[38]  Patrick Pérez,et al.  Explainability of vision-based autonomous driving systems: Review and challenges , 2021, ArXiv.

[39]  John McDonald,et al.  Computer vision in automated parking systems: Design, implementation and challenges , 2017, Image Vis. Comput..

[40]  Chenglin Liao,et al.  Analysis and review of state-of-the-art automatic parking assist system , 2016, 2016 IEEE International Conference on Vehicular Electronics and Safety (ICVES).

[41]  Felix Heide,et al.  Hardware-in-the-Loop End-to-End Optimization of Camera Image Processing Pipelines , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  U SaputraMuhamad Risqi,et al.  Visual SLAM and Structure from Motion in Dynamic Environments , 2018 .

[44]  David Hurych,et al.  Challenges in Designing Datasets and Validation for Autonomous Driving , 2019, VISIGRAPP.

[45]  Dongbing Gu,et al.  DeepSLAM: A Robust Monocular SLAM System With Unsupervised Deep Learning , 2021, IEEE Transactions on Industrial Electronics.

[46]  Senthil Yogamani,et al.  Analysis of Efficient CNN Design Techniques for Semantic Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[47]  Roland Siegwart,et al.  Automated valet parking and charging for e-mobility , 2016, 2016 IEEE Intelligent Vehicles Symposium (IV).

[48]  Khelifa Baizid,et al.  Vector Maps: A Lightweight and Accurate Map Format for Multi-robot Systems , 2016, ICIRA.

[49]  Madhukar Budagavi,et al.  Dual-fisheye lens stitching for 360-degree imaging , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[50]  Mu Han,et al.  Parking Space Recognition Method Based on Parking Space Feature Construction in the Scene of Autonomous Valet Parking , 2021 .

[51]  Marc Pollefeys,et al.  Real-Time Direct Dense Matching on Fisheye Images Using Plane-Sweeping Stereo , 2014, 2014 2nd International Conference on 3D Vision.

[52]  Garrick J. Forkenbrock,et al.  Test Procedures Traffic Jam Assist Test Development Considerations , 2019 .

[53]  Dorian Gálvez-López,et al.  Bags of Binary Words for Fast Place Recognition in Image Sequences , 2012, IEEE Transactions on Robotics.

[54]  Daniel Cremers,et al.  The Double Sphere Camera Model , 2018, 2018 International Conference on 3D Vision (3DV).

[55]  Olivier Stasse,et al.  MonoSLAM: Real-Time Single Camera SLAM , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[56]  Jiri Matas,et al.  Multi-Class Model Fitting by Energy Minimization and Mode-Seeking , 2017, ECCV.

[57]  G. Klein,et al.  Parallel Tracking and Mapping for Small AR Workspaces , 2007, 2007 6th IEEE and ACM International Symposium on Mixed and Augmented Reality.

[58]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[59]  Oihana Otaegui,et al.  Embedding vision-based advanced driver assistance systems: a survey , 2017 .

[60]  Alexandru Tupan,et al.  Triangulation , 1997, Comput. Vis. Image Underst..

[61]  Abderrahim Benslimane,et al.  Localization and Navigation in Autonomous Driving: Threats and Countermeasures , 2019, IEEE Wireless Communications.

[62]  D Marr,et al.  A computational theory of human stereo vision. , 1979, Proceedings of the Royal Society of London. Series B, Biological sciences.

[63]  Stefan Lüke,et al.  Traffic-Jam Assistance and Automation , 2015 .

[64]  Irene Isaksson-Hellman,et al.  The effect of a low-speed automatic brake system estimated from real life data. , 2012, Annals of advances in automotive medicine. Association for the Advancement of Automotive Medicine. Annual Scientific Conference.

[65]  Stefan Milz,et al.  SynDistNet: Self-Supervised Monocular Fisheye Camera Distance Estimation Synergized with Semantic Segmentation for Autonomous Driving , 2020, 2021 IEEE Winter Conference on Applications of Computer Vision (WACV).

[66]  Stefan Milz,et al.  WoodScape: A Multi-Task, Multi-Camera Fisheye Dataset for Autonomous Driving , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[67]  Senthil Yogamani,et al.  FisheyeMultiNet: Real-time Multi-task Learning Architecture for Surround-view Automated Parking System , 2019, ArXiv.

[68]  Raja Bala,et al.  Computer vision in roadway transportation systems: a survey , 2013, J. Electronic Imaging.

[69]  Martin Schels,et al.  A Survey on Methods for the Safety Assurance of Machine Learning Based Systems , 2020 .

[70]  Shaojie Shen,et al.  Dual-fisheye omnidirectional stereo , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[71]  Zuzana Kukelova,et al.  Radial distortion homography , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[72]  Michael Milford,et al.  LoST? Appearance-Invariant Place Recognition for Opposite Viewpoints using Visual Semantics , 2018, Robotics: Science and Systems.

[73]  Paul Newman,et al.  1 year, 1000 km: The Oxford RobotCar dataset , 2017, Int. J. Robotics Res..

[74]  J. M. M. Montiel,et al.  ORB-SLAM: A Versatile and Accurate Monocular SLAM System , 2015, IEEE Transactions on Robotics.

[75]  Daniel Cremers,et al.  LSD-SLAM: Large-Scale Direct Monocular SLAM , 2014, ECCV.

[76]  Alexander Carballo,et al.  A Survey of Autonomous Driving: Common Practices and Emerging Technologies , 2019, IEEE Access.

[77]  Jessica B. Cicchino,et al.  Real-world effects of rear automatic braking and other backing assistance systems. , 2019, Journal of safety research.

[78]  Frank Gauterin,et al.  Odometry 2.0: A Slip-Adaptive EIF-Based Four-Wheel-Odometry Model for Parking , 2019, IEEE Transactions on Intelligent Vehicles.

[79]  Robert T. Collins,et al.  A space-sweep approach to true multi-image matching , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[80]  Adrien Bartoli,et al.  Fast Explicit Diffusion for Accelerated Features in Nonlinear Scale Spaces , 2013, BMVC.

[81]  Ter-Feng Wu,et al.  Research and implementation of auto parking system based on ultrasonic sensors , 2016, 2016 International Conference on Advanced Materials for Science and Engineering (ICAMSE).

[82]  Sen Wang,et al.  DeepVO: Towards end-to-end visual odometry with deep Recurrent Convolutional Neural Networks , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[83]  Xiao Lin,et al.  Depth Estimation and Semantic Segmentation from a Single RGB Image Using a Hybrid Convolutional Neural Network , 2019, Sensors.

[84]  Alberto Ferreira de Souza,et al.  Self-Driving Cars: A Survey , 2019, Expert Syst. Appl..

[85]  Ming Yang,et al.  CNN based semantic segmentation for urban traffic scenes using fisheye camera , 2017, 2017 IEEE Intelligent Vehicles Symposium (IV).

[86]  Andrew J. Davison,et al.  DTAM: Dense tracking and mapping in real-time , 2011, 2011 International Conference on Computer Vision.

[87]  Andreas Geiger,et al.  Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[88]  E. R. Davies The Three-Dimensional World , 1990 .

[89]  Ashutosh Singandhupe,et al.  A Review of SLAM Techniques and Security in Autonomous Driving , 2019, 2019 Third IEEE International Conference on Robotic Computing (IRC).

[90]  Thomas Brox,et al.  What Do Single-View 3D Reconstruction Networks Learn? , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[91]  Bat El Shlomo,et al.  3D Object Detection from a Single Fisheye Image Without a Single Fisheye Training Image , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[92]  Senthil Yogamani,et al.  Generalized Object Detection on Fisheye Cameras for Autonomous Driving: Dataset, Representations and Baseline , 2020, ArXiv.

[93]  Senthil Yogamani,et al.  AuxNet: Auxiliary tasks enhanced Semantic Segmentation for Automated Driving , 2019, VISIGRAPP.

[94]  Martin Jägersand,et al.  MODNet: Motion and Appearance based Moving Object Detection Network for Autonomous Driving , 2018, 2018 21st International Conference on Intelligent Transportation Systems (ITSC).

[95]  Martin Jägersand,et al.  Deep semantic segmentation for automated driving: Taxonomy, roadmap and challenges , 2017, 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC).

[96]  Andrei Bursuc,et al.  Dynamic Task Weighting Methods for Multi-task Networks in Autonomous Driving Systems , 2020, 2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC).

[97]  Senthil Yogamani,et al.  FisheyeMODNet: Moving Object detection on Surround-view Cameras for Autonomous Driving , 2019, ArXiv.

[98]  Gary R. Bradski,et al.  ORB: An efficient alternative to SIFT or SURF , 2011, 2011 International Conference on Computer Vision.

[99]  Ming Yang,et al.  Automatic Parking Based on a Bird's Eye View Vision System , 2014 .

[100]  Gabriel J. Brostow,et al.  Digging Into Self-Supervised Monocular Depth Estimation , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[101]  Stephen Gould,et al.  Single image depth estimation from predicted semantic labels , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[102]  Edward Jones,et al.  Wide-angle camera technology for automotive applications: a review , 2009 .

[103]  Zuzana Kukelova,et al.  Radial Distortion Triangulation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[104]  Somkiat Wangsiripitak,et al.  Avoiding moving outliers in visual SLAM by tracking moving objects , 2009, 2009 IEEE International Conference on Robotics and Automation.

[105]  Noah Snavely,et al.  Unsupervised Learning of Depth and Ego-Motion from Video , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[106]  Suren Jayasuriya,et al.  Reconfiguring the Imaging Pipeline for Computer Vision , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[107]  Rares Ambrus,et al.  3D Packing for Self-Supervised Monocular Depth Estimation , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[108]  Jan-Michael Frahm,et al.  Convolutions on Spherical Images , 2019, CVPR Workshops.

[109]  Patrick Mäder,et al.  UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[110]  Sparsh Mittal,et al.  A Survey on optimized implementation of deep learning models on the NVIDIA Jetson platform , 2019, J. Syst. Archit..

[111]  PérezPatrick,et al.  Detection and segmentation of moving objects in complex scenes , 2009 .

[112]  Ulrich Brunsmann,et al.  FPGA-GPU architecture for kernel SVM pedestrian detection , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[113]  Patrick Mäder,et al.  FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[114]  Daniel Cremers,et al.  Large-scale direct SLAM for omnidirectional cameras , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[115]  Patrick Pérez,et al.  Detection and segmentation of moving objects in complex scenes , 2009, Comput. Vis. Image Underst..

[116]  Bernd Jähne,et al.  Wide Base Stereo with Fisheye Optics: A Robust Approach for 3D Reconstruction in Driving Assistance , 2014, GCPR.

[117]  Stephen Gould,et al.  Decomposing a scene into geometric and semantically consistent regions , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[118]  Max Welling,et al.  Spherical CNNs , 2018, ICLR.

[119]  Jitendra Malik,et al.  The three R's of computer vision: Recognition, reconstruction and reorganization , 2016, Pattern Recognit. Lett..

[120]  Victor Talpaert,et al.  Real-time Dynamic Object Detection for Autonomous Driving using Prior 3D-Maps , 2018, ECCV Workshops.

[121]  Wolfram Burgard,et al.  Topometric Localization with Deep Learning , 2017, ISRR.

[122]  Jimmy Li,et al.  Semantic Mapping for View-Invariant Relocalization , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[123]  Alan Yuille,et al.  ViP-DeepLab: Learning Visual Perception with Depth-aware Video Panoptic Segmentation , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[124]  Martin Jägersand,et al.  RTSeg: Real-Time Semantic Segmentation Comparative Study , 2018, 2018 25th IEEE International Conference on Image Processing (ICIP).

[125]  Mohammad H. Marhaban,et al.  Review of visual odometry: types, approaches, challenges, and applications , 2016, SpringerPlus.

[126]  Fredrik Gustafsson,et al.  Automotive Safety Systems (Replacing costly sensors with software algorithms ) , 2009 .

[127]  Senthil Yogamani,et al.  SoilingNet: Soiling Detection on Automotive Surround-View Cameras , 2019, 2019 IEEE Intelligent Transportation Systems Conference (ITSC).

[128]  Senthil Yogamani,et al.  Visual SLAM for Automated Driving: Exploring the Applications of Deep Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[129]  Senthil Kumar Yogamani,et al.  Trained Trajectory based Automated Parking System using Visual SLAM , 2020, ArXiv.

[130]  Senthil Yogamani,et al.  Overview and Empirical Analysis of ISP Parameter Tuning for Visual Perception in Autonomous Driving , 2019, J. Imaging.