Real-time scene parsing by means of a convolutional neural network for mobile robots in disaster scenarios

Disaster robotics poses particular challenges for computer vision, both in terms of image characteristics (due to motion blur, difficult light conditions, lack of up/down orientation, etc.), and in terms of learning data (limited availability, difficulty of annotation due to image quality, etc.). We developed a system for real-time scene-parsing, intended for use in a support system for operators of remote-controlled mobile robots employed in disaster areas. Our testbed is video footage gathered by a snake-like mobile robot exploring an (artificial) collapsed building environment. The core of the system is a relatively small-scale convolutional neural network. Our approach combines pixel-level learning with superpixel-level classification, in an effort to learn efficiently from a relatively small number of partially annotated frames. Our classification system is capable of real-time operation, and demonstrates that convolutional neural networks can be applied effectively even under the harsh conditions imposed by disaster robotics.

[1]  Vadlamani Ravi,et al.  A new online data imputation method based on general regression auto associative neural network , 2014, Neurocomputing.

[2]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[3]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Jürgen Schmidhuber,et al.  Multi-column deep neural networks for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Stephen Gould,et al.  Decomposing a scene into geometric and semantically consistent regions , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[6]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[7]  Yann LeCun,et al.  Indoor Semantic Segmentation using depth information , 2013, ICLR.

[8]  Sven Behnke,et al.  Fast Semantic Segmentation of RGB-D Scenes with GPU-Accelerated Deep Neural Networks , 2014, KI.

[9]  Wolfram Burgard,et al.  Multimodal deep learning for robust RGB-D object recognition , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[10]  Xiaogang Wang,et al.  Crafting GBD-Net for Object Detection , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[12]  Eijiro Takeuchi,et al.  Remote vertical exploration by Active Scope Camera into collapsed buildings , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[13]  Luca Maria Gambardella,et al.  High-Performance Neural Networks for Visual Object Classification , 2011, ArXiv.

[14]  Camille Couprie,et al.  Learning Hierarchical Features for Scene Labeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[16]  Razvan Pascanu,et al.  M L ] 2 0 A ug 2 01 3 Pylearn 2 : a machine learning research library , 2014 .

[17]  Volker Tresp,et al.  Training Neural Networks with Deficient Data , 1993, NIPS.

[18]  Xiaogang Wang,et al.  Gated Bi-directional CNN for Object Detection , 2016, ECCV.