Learning End-to-end Multimodal Sensor Policies for Autonomous Navigation

Sensor fusion is indispensable to improve accuracy and robustness in an autonomous navigation setting. However, in the space of end-to-end sensorimotor control, this multimodal outlook has received limited attention. In this work, we propose a novel stochastic regularization technique, called Sensor Dropout, to robustify multimodal sensor policy learning outcomes. We also introduce an auxiliary loss on policy network along with the standard DRL loss that help reduce the action variations of the multimodal sensor policy. Through empirical testing we demonstrate that our proposed policy can 1) operate with minimal performance drop in noisy environments, 2) remain functional even in the face of a sensor subset failure. Finally, through the visualization of gradients, we show that the learned policies are conditioned on the same latent input distribution despite having multiple sensory observations spaces - a hallmark of true sensor-fusion. This efficacy of a multimodal policy is shown through simulations on TORCS, a popular open-source racing car game. A demo video can be seen here: this https URL

[1]  B. V. K. Vijaya Kumar,et al.  A multi-sensor fusion system for moving object detection and tracking in urban driving environments , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[2]  Sergey Levine,et al.  Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic , 2016, ICLR.

[3]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[4]  Xin Zhang,et al.  End to End Learning for Self-Driving Cars , 2016, ArXiv.

[5]  Tom Schaul,et al.  Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.

[6]  Vladlen Koltun,et al.  Learning to Act by Predicting the Future , 2016, ICLR.

[7]  William Whittaker,et al.  Tartan Racing: A multi-modal approach to the DARPA Urban Challenge , 2007 .

[8]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[9]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[10]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[11]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[12]  Sergey Levine,et al.  Continuous Deep Q-Learning with Model-based Acceleration , 2016, ICML.

[13]  Zhen Li,et al.  Blockout: Dynamic Model Selection for Hierarchical Deep Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[15]  Razvan Pascanu,et al.  Learning to Navigate in Complex Environments , 2016, ICLR.

[16]  Juhan Nam,et al.  Multimodal Deep Learning , 2011, ICML.

[17]  Ruslan Salakhutdinov,et al.  Multimodal Neural Language Models , 2014, ICML.

[18]  Christos Dimitrakakis,et al.  TORCS, The Open Racing Car Simulator , 2005 .

[19]  Steven Bohez,et al.  Sensor fusion for robot control through deep reinforcement learning , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[20]  Yoshua Bengio,et al.  Zoneout: Regularizing RNNs by Randomly Preserving Hidden Activations , 2016, ICLR.

[21]  Yang Gao,et al.  End-to-End Learning of Driving Models from Large-Scale Video Datasets , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[23]  C. Urmson,et al.  Classification and tracking of dynamic objects with multiple sensors for autonomous driving in urban environments , 2008, 2008 IEEE Intelligent Vehicles Symposium.

[24]  Nitish Srivastava,et al.  Multimodal learning with deep Boltzmann machines , 2012, J. Mach. Learn. Res..

[25]  Jianxiong Xiao,et al.  DeepDriving: Learning Affordance for Direct Perception in Autonomous Driving , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[26]  Jos Elfring,et al.  Multisensor simultaneous vehicle tracking and shape estimation , 2016, 2016 IEEE Intelligent Vehicles Symposium (IV).

[27]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[28]  Tom Schaul,et al.  Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.

[29]  Guy Lever,et al.  Deterministic Policy Gradient Algorithms , 2014, ICML.

[30]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[31]  Geoffrey J. Gordon,et al.  A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[32]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[33]  Guillaume Lample,et al.  Playing FPS Games with Deep Reinforcement Learning , 2016, AAAI.

[34]  R. Mazo On the theory of brownian motion , 1973 .

[35]  Sergey Levine,et al.  Guided Policy Search , 2013, ICML.

[36]  Yann LeCun,et al.  Regularization of Neural Networks using DropConnect , 2013, ICML.

[37]  Glen Berseth,et al.  Terrain-adaptive locomotion skills using deep reinforcement learning , 2016, ACM Trans. Graph..

[38]  Christian Wolf,et al.  ModDrop: Adaptive Multi-Modal Gesture Recognition , 2014, IEEE Trans. Pattern Anal. Mach. Intell..

[39]  Joseph Gonzalez,et al.  Composing Meta-Policies for Autonomous Driving Using Hierarchical Deep Reinforcement Learning , 2017, ArXiv.

[40]  Christian Wolf,et al.  Modout: Learning to Fuse Modalities via Stochastic Regularization , 2016 .

[41]  Anna Choromanska,et al.  Sensor modality fusion with CNNs for UGV autonomous driving in indoor environments , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[42]  Yuichiro Yoshikawa,et al.  Robot gains social intelligence through multimodal deep reinforcement learning , 2016, 2016 IEEE-RAS 16th International Conference on Humanoid Robots (Humanoids).

[43]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[44]  Luís A. Alexandre,et al.  DropAll: Generalization of Two Convolutional Neural Network Regularization Methods , 2014, ICIAR.