Training of Convolutional Networks on Multiple Heterogeneous Datasets for Street Scene Semantic Segmentation

We propose a convolutional network with hierarchical classifiers for per-pixel semantic segmentation, which is able to be trained on multiple, heterogeneous datasets and exploit their semantic hierarchy. Our network is the first to be simultaneously trained on three different datasets from the intelligent vehicles domain, i.e. Cityscapes, GTSDB and Mapillary Vistas, and is able to handle different semantic levelof-detail, class imbalances, and different annotation types, i.e. dense per-pixel and sparse bounding-box labels. We assess our hierarchical approach, by comparing against flat, nonhierarchical classifiers and we show improvements in mean pixel accuracy of 13.0% for Cityscapes classes and 2.4% for Vistas classes and 32.3% for GTSDB classes. Our implementation achieves inference rates of 17 fps at a resolution of 520 x 706 for 108 classes running on a GPU.

[1]  José García Rodríguez,et al.  A Review on Deep Learning Techniques Applied to Semantic Segmentation , 2017, ArXiv.

[2]  Rishi Kumar,et al.  Hierarchical CNN for traffic sign recognition , 2016, 2016 IEEE Intelligent Vehicles Symposium (IV).

[3]  Zhi Liu,et al.  Learning Semantic Segmentation with Diverse Supervision , 2018, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[4]  Andreas Geiger,et al.  Computer Vision for Autonomous Vehicles: Problems, Datasets and State-of-the-Art , 2017, Found. Trends Comput. Graph. Vis..

[5]  FengJiashi,et al.  A survey on deep learning-based fine-grained object classification and semantic segmentation , 2017 .

[6]  Johannes Stallkamp,et al.  Detection of traffic signs in real-world images: The German traffic sign detection benchmark , 2013, The 2013 International Joint Conference on Neural Networks (IJCNN).

[7]  Arthur Daniel Costea,et al.  Semi-automatic image annotation of street scenes , 2017, 2017 IEEE Intelligent Vehicles Symposium (IV).

[8]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Peter Kontschieder,et al.  The Mapillary Vistas Dataset for Semantic Understanding of Street Scenes , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[10]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Jürgen Schmidhuber,et al.  Multi-column deep neural networks for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  George Papandreou,et al.  Weakly-and Semi-Supervised Learning of a Deep Convolutional Network for Semantic Image Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[13]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[14]  Liqing Zhang,et al.  Hierarchical Semantic Classification and Attribute Relations Analysis with Clothing Region Detection , 2016 .

[15]  Shuicheng Yan,et al.  A survey on deep learning-based fine-grained object classification and semantic segmentation , 2017, International Journal of Automation and Computing.