FuseMODNet: Real-Time Camera and LiDAR Based Moving Object Detection for Robust Low-Light Autonomous Driving

Moving object detection is a critical task for autonomous vehicles. As dynamic objects represent higher collision risk than static ones, our own ego-trajectories have to be planned attending to the future states of the moving elements of the scene. Motion can be perceived using temporal information such as optical flow. Conventional optical flow computation is based on camera sensors only, which makes it prone to failure in conditions with low illumination. On the other hand, LiDAR sensors are independent of illumination, as they measure the time-of-flight of their own emitted lasers. In this work we propose a robust and real-time CNN architecture for Moving Object Detection (MOD) under low-light conditions by capturing motion information from both camera and LiDAR sensors. We demonstrate the impact of our algorithm on KITTI dataset where we simulate a low-light environment creating a novel dataset "Dark-KITTI". We obtain a 10.1 % relative improvement on Dark-KITTI, and a 4.25 % improvement on standard KITTI relative to our baselines. The proposed algorithm runs at 29 fps on a standard desktop GPU using 256x1224 resolution images.

[1]  Thomas Brox,et al.  Object Detection, Tracking, and Motion Segmentation for Object-level Video Segmentation , 2016, ArXiv.

[2]  Martin Jägersand,et al.  MODNet: Moving Object Detection Network with Motion and Appearance for Autonomous Driving , 2017, ArXiv.

[3]  Wolfram Burgard,et al.  SMSnet: Semantic motion segmentation using deep convolutional neural networks , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[4]  Martin Jägersand,et al.  RTSeg: Real-Time Semantic Segmentation Comparative Study , 2018, 2018 25th IEEE International Conference on Image Processing (ICIP).

[5]  Stefan Milz,et al.  WoodScape: A Multi-Task, Multi-Camera Fisheye Dataset for Autonomous Driving , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[6]  Martin Jägersand,et al.  Real-Time Segmentation with Appearance, Motion and Geometry , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[7]  Bo Li,et al.  3D fully convolutional network for vehicle detection in point cloud , 2016, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[8]  Senthil Yogamani,et al.  Optical Flow augmented Semantic Segmentation networks for Automated Driving , 2019, VISIGRAPP.

[9]  Richard Szeliski,et al.  Video Segmentation with Background Motion Models , 2017, BMVC.

[10]  Danna Zhou,et al.  d. , 1934, Microbial pathogenesis.

[11]  Yali Amit,et al.  Object Detection , 2020, Computer Vision, A Reference Guide.

[12]  Mennatullah Siam,et al.  ShuffleSeg: Real-time Semantic Segmentation Network , 2018, ArXiv.

[13]  Andreas Geiger,et al.  Object scene flow for autonomous vehicles , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Xiangyu Zhang,et al.  ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[15]  Francesc Moreno-Noguer,et al.  Hallucinating Dense Optical Flow from Sparse Lidar for Autonomous Vehicles , 2018, 2018 24th International Conference on Pattern Recognition (ICPR).

[16]  Cyrill Stachniss,et al.  SemanticKITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[17]  Omar Nasr,et al.  RGB and LiDAR fusion based 3D Semantic Segmentation for Autonomous Driving , 2019, 2019 IEEE Intelligent Transportation Systems Conference (ITSC).

[18]  Sven Behnke,et al.  Multispectral Pedestrian Detection using Deep Fusion Convolutional Neural Networks , 2016, ESANN.

[19]  Senthil Yogamani,et al.  Analysis of Efficient CNN Design Techniques for Semantic Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[20]  Kaiming He,et al.  Mask R-CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[21]  Wolfram Burgard,et al.  Deep semantic classification for 3D LiDAR data , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[22]  John B. Shoven,et al.  I , Edinburgh Medical and Surgical Journal.

[23]  Xueting Li,et al.  A Closed-form Solution to Photorealistic Image Stylization , 2018, ECCV.

[24]  Wolfram Burgard,et al.  Motion-based detection and tracking in 3D LiDAR scans , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[25]  Senthil Yogamani,et al.  SoilingNet: Soiling Detection on Automotive Surround-View Cameras , 2019, 2019 IEEE Intelligent Transportation Systems Conference (ITSC).

[26]  Senthil Yogamani,et al.  Visual SLAM for Automated Driving: Exploring the Applications of Deep Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[27]  John McDonald,et al.  Computer vision in automated parking systems: Design, implementation and challenges , 2017, Image Vis. Comput..

[28]  Trevor Darrell,et al.  BDD100K: A Diverse Driving Video Database with Scalable Annotation Tooling , 2018, ArXiv.

[29]  Daniel Cremers,et al.  FuseNet: Incorporating Depth into Semantic Segmentation via Fusion-Based CNN Architecture , 2016, ACCV.

[30]  Senthil Yogamani,et al.  FisheyeMODNet: Moving Object detection on Surround-view Cameras for Autonomous Driving , 2019, ArXiv.

[31]  Jan Kautz,et al.  Unsupervised Image-to-Image Translation Networks , 2017, NIPS.

[32]  Senthil Yogamani,et al.  Monocular Fisheye Camera Depth Estimation Using Sparse LiDAR Supervision , 2018, 2018 21st International Conference on Intelligent Transportation Systems (ITSC).

[33]  Senthil Yogamani,et al.  Motion and Depth Augmented Semantic Segmentation for Autonomous Navigation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[34]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Francesc Moreno-Noguer,et al.  Deconvolutional networks for point-cloud vehicle detection and tracking in driving scenarios , 2017, 2017 European Conference on Mobile Robots (ECMR).

[36]  Tian Xia,et al.  Vehicle Detection from 3D Lidar Using Fully Convolutional Network , 2016, Robotics: Science and Systems.

[37]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Thomas Brox,et al.  FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Martin Jägersand,et al.  Deep semantic segmentation for automated driving: Taxonomy, roadmap and challenges , 2017, 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC).

[40]  Heiko Neumann,et al.  Fully Convolutional Region Proposal Networks for Multispectral Person Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[41]  John McDonald,et al.  Vision-Based Driver Assistance Systems: Survey, Taxonomy and Advances , 2015, 2015 IEEE 18th International Conference on Intelligent Transportation Systems.

[42]  Chengyang Li,et al.  Illumination-aware Faster R-CNN for Robust Multispectral Pedestrian Detection , 2018, Pattern Recognit..

[43]  Kristen Grauman,et al.  FusionSeg: Learning to Combine Motion and Appearance for Fully Automatic Segmentation of Generic Objects in Videos , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  T. Başar,et al.  A New Approach to Linear Filtering and Prediction Problems , 2001 .

[45]  拓海 杉山,et al.  “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[46]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .