Sensor modality fusion with CNNs for UGV autonomous driving in indoor environments

We present a novel end-to-end learning framework to enable ground vehicles to autonomously navigate unknown environments by fusing raw pixels from cameras and depth measurements from a LiDAR. A deep neural network architecture is introduced to effectively perform modality fusion and reliably predict steering commands even in the presence of sensor failures. The proposed network is trained on our own dataset, from LiDAR and a camera mounted on a UGV taken in an indoor corridor environment. Comprehensive experimental evaluation to demonstrate the robustness of our network architecture is performed to show that the proposed deep learning neural network is able to autonomously navigate in the corridor environment. Furthermore, we demonstrate that the fusion of the camera and LiDAR modalities provides further benefits beyond robustness to sensor failures. Specifically, the multimodal fused system shows a potential to navigate around static and dynamic obstacles and to handle changes in environment geometry without being trained for these tasks.

[1]  Razvan Pascanu,et al.  Sim-to-Real Robot Learning from Pixels with Progressive Nets , 2016, CoRL.

[3]  Torsten Bertram,et al.  Situated Learning of Visual Robot Behaviors , 2011, ICIRA.

[4]  Xin Zhang,et al.  End to End Learning for Self-Driving Cars , 2016, ArXiv.

[5]  Antonio Torralba,et al.  Learning Aligned Cross-Modal Representations from Weakly Aligned Data , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Leslie Pack Kaelbling,et al.  Effective reinforcement learning for mobile robots , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[7]  Prashanth Krishnamurthy,et al.  Control design for unmanned sea surface vehicles: hardware-in-the-loop simulator and experimental results , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[8]  Juyang Weng,et al.  Approaching Camera-based Real-World Navigation Using Object Recognition , 2015, INNS Conference on Big Data.

[9]  Razvan Pascanu,et al.  Learning to Navigate in Complex Environments , 2016, ICLR.

[10]  Prashanth Krishnamurthy,et al.  GODZILA: A Low-resource Algorithm for Path Planning in Unknown Environments , 2005, Proceedings of the 2005, American Control Conference, 2005..

[11]  Walter D. Potter,et al.  A mobile robot for corridor navigation: a multi-agent approach , 2004, ACM-SE 42.

[12]  Michael Giering,et al.  Multi-modal sensor registration for vehicle perception via deep neural networks , 2014, 2015 IEEE High Performance Extreme Computing Conference (HPEC).

[13]  Amnon Shashua,et al.  Safe, Multi-Agent, Reinforcement Learning for Autonomous Driving , 2016, ArXiv.

[14]  Eder Santana,et al.  Learning a Driving Simulator , 2016, ArXiv.

[15]  Jianxiong Xiao,et al.  Deep Sliding Shapes for Amodal 3D Object Detection in RGB-D Images , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Ali Farhadi,et al.  Target-driven visual navigation in indoor scenes using deep reinforcement learning , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[17]  Magnus Jonsson,et al.  Vision-Based Low-Level Navigation using a Feed-Forward Neural Network , 1997 .

[18]  Andreas Geiger,et al.  Visual SLAM for autonomous ground vehicles , 2011, 2011 IEEE International Conference on Robotics and Automation.

[19]  Biswarup Bhattacharya,et al.  SAD-GAN: Synthetic Autonomous Driving using Generative Adversarial Networks , 2016, ArXiv.

[20]  Ben J. A. Kröse,et al.  Navigation of a mobile robot on the temporal development of the optic flow , 1997, Proceedings of the 1997 IEEE/RSJ International Conference on Intelligent Robot and Systems. Innovative Robotics for Real-World Applications. IROS '97.

[21]  Avinash C. Kak,et al.  Vision for Mobile Robot Navigation: A Survey , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[22]  Stanley T. Birchfield,et al.  Autonomous navigation and mapping using monocular low-resolution grayscale vision , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[23]  Shijian Lu,et al.  Discriminative Multi-modal Feature Fusion for RGBD Indoor Scene Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Mariusz Bojarski,et al.  VisualBackProp: visualizing CNNs for autonomous driving , 2016, ArXiv.

[25]  Farshad Khorrami,et al.  Humanoid robot navigation and obstacle avoidance in unknown environments , 2013, 2013 9th Asian Control Conference (ASCC).

[26]  Prashanth Krishnamurthy,et al.  A hierarchical control and obstacle avoidance system for Unmanned Sea Surface Vehicles , 2011, IEEE Conference on Decision and Control and European Control Conference.

[27]  Jitendra Malik,et al.  Learning Rich Features from RGB-D Images for Object Detection and Segmentation , 2014, ECCV.

[28]  Daniel Cremers,et al.  FuseNet: Incorporating Depth into Semantic Segmentation via Fusion-Based CNN Architecture , 2016, ACCV.

[29]  Wolfram Burgard,et al.  Probabilistic Robotics (Intelligent Robotics and Autonomous Agents) , 2005 .

[30]  Jianxiong Xiao,et al.  DeepDriving: Learning Affordance for Direct Perception in Autonomous Driving , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[31]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[32]  François Chaumette,et al.  A visual servoing approach for autonomous corridor following and doorway passing in a wheelchair , 2016, Robotics Auton. Syst..

[33]  Rahul Sukthankar,et al.  Cognitive Mapping and Planning for Visual Navigation , 2017, International Journal of Computer Vision.

[34]  Avinash C. Kak,et al.  NEURO-NAV: a neural network based architecture for vision-guided mobile robot navigation using non-metrical models of the environment , 1993, [1993] Proceedings IEEE International Conference on Robotics and Automation.