Bonnet: An Open-Source Training and Deployment Framework for Semantic Segmentation in Robotics using CNNs

The ability to interpret a scene is an important capability for a robot that is supposed to interact with its environment. The knowledge of what is in front of the robot is, for example, relevant for navigation, manipulation, or planning. Semantic segmentation labels each pixel of an image with a class label and thus provides a detailed semantic annotation of the surroundings to the robot. Convolutional neural networks (CNNs) are popular methods for addressing this type of problem. The available software for training and the integration of CNNs for real robots, however, is quite fragmented and often difficult to use for non-experts, despite the availability of several high-quality open-source frameworks for neural network implementation and training. In this paper, we propose a tool called Bonnet, which addresses this fragmentation problem by building a higher abstraction that is specific for the semantic segmentation task. It provides a modular approach to simplify the training of a semantic segmentation CNN independently of the used dataset and the intended task. Furthermore, we also address the deployment on a real robotic platform. Thus, we do not propose a new CNN approach in this paper. Instead, we provide a stable and easy-to-use tool to make this technology more approachable in the context of autonomous systems. In this sense, we aim at closing a gap between computer vision research and its use in robotics research. We provide an open-source codebase for training and deployment. The training interface is implemented in Python using TensorFlow and the deployment interface provides C++ library that can be easily integrated in an existing robotics codebase, a ROS node, and two standalone applications for label prediction in images and videos.

[1]  Antonio M. López,et al.  The SYNTHIA Dataset: A Large Collection of Synthetic Images for Semantic Segmentation of Urban Scenes , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Sven Behnke,et al.  RGB-D object detection and semantic segmentation for autonomous manipulation in clutter , 2018, Int. J. Robotics Res..

[3]  George J. Pappas,et al.  Localization from semantic observations via the matrix permanent , 2016, Int. J. Robotics Res..

[4]  Patrick Rives,et al.  Semantic representation for navigation in large-scale environments , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[5]  Vibhav Vineet,et al.  Conditional Random Fields as Recurrent Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[6]  Cyrill Stachniss,et al.  An effective classification system for separating sugar beets and weeds for precision farming applications , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[7]  Xiaogang Wang,et al.  Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[9]  Cyrill Stachniss,et al.  REAL-TIME BLOB-WISE SUGAR BEETS VS WEEDS CLASSIFICATION FOR MONITORING FIELDS USING CONVOLUTIONAL NEURAL NETWORKS , 2017 .

[10]  Vincent Lepetit,et al.  Semantic segmentation for 3D localization in urban environments , 2017, 2017 Joint Urban Remote Sensing Event (JURSE).

[11]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[12]  Eugenio Culurciello,et al.  ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation , 2016, ArXiv.

[13]  Tianqi Chen,et al.  Training Deep Nets with Sublinear Memory Cost , 2016, ArXiv.

[14]  Kaiming He,et al.  Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.

[15]  Roland Siegwart,et al.  Beyond point clouds - 3D mapping and field parameter measurements using UAVs , 2015, 2015 IEEE 20th Conference on Emerging Technologies & Factory Automation (ETFA).

[16]  Wolfram Burgard,et al.  Agricultural robot dataset for plant classification, localization and mapping on sugar beet fields , 2017, Int. J. Robotics Res..

[17]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[19]  Eduardo Romera,et al.  ERFNet: Efficient Residual Factorized ConvNet for Real-Time Semantic Segmentation , 2018, IEEE Transactions on Intelligent Transportation Systems.

[20]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[21]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[22]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[23]  Nico Blodow,et al.  Autonomous semantic mapping for robots performing everyday manipulation tasks in kitchen environments , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[24]  Xiaojuan Qi,et al.  ICNet for Real-Time Semantic Segmentation on High-Resolution Images , 2017, ECCV.

[25]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[26]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[27]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Wolfgang Förstner,et al.  eTRIMS Image Database for Interpreting Images of Man-Made Scenes , 2009 .

[29]  Cyrill Stachniss,et al.  Classifying Obstacles and Exploiting Knowledge About Classes for Efficient Humanoid Navigation , 2018, 2018 IEEE-RAS 18th International Conference on Humanoid Robots (Humanoids).

[30]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Peer Neubert,et al.  Synthesized semantic views for mobile robot localization , 2017, 2017 European Conference on Mobile Robots (ECMR).

[33]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[35]  Gordon Wyeth,et al.  Place categorization and semantic mapping on a mobile robot , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[36]  Cyrill Stachniss,et al.  Real-Time Semantic Segmentation of Crop and Weed for Precision Agriculture Robots Leveraging Background Knowledge in CNNs , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[37]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[38]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[39]  Dongbing Gu,et al.  Building a grid-point cloud-semantic map based on graph for the navigation of intelligent wheelchair , 2015, 2015 21st International Conference on Automation and Computing (ICAC).

[40]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).