RoadNet-RT: High Throughput CNN Architecture and SoC Design for Real-Time Road Segmentation

In recent years, convolutional neural network (CNN) has gained popularity in many engineering applications especially for computer vision. In order to achieve better performance, more complex structures and advanced operations are incorporated into neural networks, which results in very long inference time. For time-critical tasks such as autonomous driving and virtual reality, real-time processing is fundamental. In order to reach real-time processing speed, a lightweight, high-throughput CNN architecture namely RoadNet-RT is proposed for road segmentation in this article. It achieves 92.55% MaxF score on KITTI road segmentation dataset. The inference time is about 9 ms per frame when running on GTX 1080 GPU. Comparing to the state-of-the-art network, RoadNet-RT speeds up the inference time by a factor of 17.8 at the cost of only 3.75% loss in accuracy. What is more, on CamVid dataset its accuracy is 92.98%. Several techniques such as depthwise separable convolution and non-uniformed kernel size convolution are optimized in the hardware accelerator design. The proposed CNN architecture has been successfully implemented on a ZCU102 MPSoC FPGA that achieves the computation capability of 331 GOPS using INT8 quantization. The system throughput reaches 196.7 frames per second with input image size of $280\times 960$ . The source code is published at https://github.com/linbaiwpi/RoadNet-RT.

[1]  Yecheng Lyu,et al.  Road Segmentation using CNN and Distributed LSTM , 2018, 2019 IEEE International Symposium on Circuits and Systems (ISCAS).

[2]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[3]  Xiaogang Wang,et al.  Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[5]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Vladlen Koltun,et al.  Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[7]  Linda G. Shapiro,et al.  ESPNet: Efficient Spatial Pyramid of Dilated Convolutions for Semantic Segmentation , 2018, ECCV.

[8]  Xiaojuan Qi,et al.  ICNet for Real-Time Semantic Segmentation on High-Resolution Images , 2017, ECCV.

[9]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Torsten Bertram,et al.  A Fast Multi-Task CNN for Spatial Understanding of Traffic Scenes , 2018, 2018 21st International Conference on Intelligent Transportation Systems (ITSC).

[12]  Xinming Huang,et al.  ChipNet: Real-Time LiDAR Processing for Drivable Region Segmentation on an FPGA , 2018, IEEE Transactions on Circuits and Systems I: Regular Papers.

[13]  Kaipeng Zhang,et al.  FarSee-Net: Real-Time Semantic Segmentation by Efficient Multi-scale Context Aggregation and Feature Space Super-resolution , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[14]  Xinming Huang,et al.  Real-Time Road Segmentation Using LiDAR Data Processing on an FPGA , 2017, 2018 IEEE International Symposium on Circuits and Systems (ISCAS).

[15]  Jian Sun,et al.  DFANet: Deep Feature Aggregation for Real-Time Semantic Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  François Chollet,et al.  Xception: Deep Learning with Depthwise Separable Convolutions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Ethan Fetaya,et al.  Real-Time Category-Based and General Obstacle Detection for Autonomous Driving , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[18]  Christopher Zach,et al.  ContextNet: Exploring Context and Detail for Semantic Segmentation in Real-time , 2018, BMVC.

[19]  Luis Miguel Bergasa,et al.  Fast pixelwise road inference based on Uniformly Reweighted Belief Propagation , 2015, 2015 IEEE Intelligent Vehicles Symposium (IV).

[20]  Ioannis Pitas,et al.  PT-ResNet: Perspective Transformation-Based Residual Network for Semantic Road Image Segmentation , 2019, 2019 IEEE International Conference on Imaging Systems and Techniques (IST).

[21]  Franz Kummert,et al.  Spatial ray features for real-time ego-lane extraction , 2012, 2012 15th International IEEE Conference on Intelligent Transportation Systems.

[22]  Ruigang Yang,et al.  CSPN++: Learning Context and Resource Aware Convolutional Spatial Propagation Networks for Depth Completion , 2019, AAAI.

[23]  Junzhong Shen,et al.  Scale-out Acceleration for 3D CNN-based Lung Nodule Segmentation on a Multi-FPGA System , 2019, 2019 56th ACM/IEEE Design Automation Conference (DAC).

[24]  Gang Yu,et al.  BiSeNet V2: Bilateral Network with Guided Aggregation for Real-Time Semantic Segmentation , 2020, International Journal of Computer Vision.

[25]  Ethan Fetaya,et al.  StixelNet: A Deep Convolutional Network for Obstacle Detection and Road Segmentation , 2015, BMVC.

[26]  Yan Yan,et al.  Real-Time High-Performance Semantic Image Segmentation of Urban Street Scenes , 2020, IEEE Transactions on Intelligent Transportation Systems.

[27]  Yu Zhang,et al.  Attention-guided Chained Context Aggregation for Semantic Segmentation , 2020, Image Vis. Comput..

[28]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Wayne Luk,et al.  Towards an Efficient Accelerator for DNN-Based Remote Sensing Image Segmentation on FPGAs , 2019, 2019 29th International Conference on Field Programmable Logic and Applications (FPL).

[30]  Ignacio Parra,et al.  Deep fully convolutional networks with random data augmentation for enhanced generalization in road detection , 2017, 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC).

[31]  Hongdong Li,et al.  Semisupervised and Weakly Supervised Road Detection Based on Generative Adversarial Networks , 2018, IEEE Signal Processing Letters.

[32]  Zhe Chen,et al.  RBNet: A Deep Neural Network for Unified Road and Road Boundary Detection , 2017, ICONIP.

[33]  Wayne Luk,et al.  Optimizing CNN-based Segmentation with Deeply Customized Convolutional and Deconvolutional Architectures on FPGA , 2018, ACM Trans. Reconfigurable Technol. Syst..

[34]  Yann LeCun,et al.  Road Scene Segmentation from a Single Image , 2012, ECCV.

[35]  Dawei Zhao,et al.  Monocular Road Detection Using Structured Random Forest , 2016 .

[36]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[37]  Xinming Huang,et al.  A Unified Hardware Architecture for Convolutions and Deconvolutions in CNN , 2020, 2020 IEEE International Symposium on Circuits and Systems (ISCAS).

[38]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[39]  Luis Miguel Bergasa,et al.  CRF-based semantic labeling in miniaturized road scenes , 2014, 17th International IEEE Conference on Intelligent Transportation Systems (ITSC).

[40]  Xiaoyong Shen,et al.  STD: Sparse-to-Dense 3D Object Detector for Point Cloud , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[41]  Mengyu Liu,et al.  Feature Pyramid Encoding Network for Real-time Semantic Segmentation , 2019, BMVC.

[42]  Wongun Choi,et al.  Near-Online Multi-target Tracking with Aggregated Local Flow Descriptor , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[43]  Wolfram Burgard,et al.  Efficient deep models for monocular road segmentation , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[44]  Linda G. Shapiro,et al.  ESPNetv2: A Light-Weight, Power Efficient, and General Purpose Convolutional Neural Network , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Jee-Young Sun,et al.  Reverse and Boundary Attention Network for Road Segmentation , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[46]  Ankit Laddha,et al.  Map-supervised road detection , 2016, 2016 IEEE Intelligent Vehicles Symposium (IV).

[47]  Marcelo H. Ang,et al.  A General Pipeline for 3D Detection of Vehicles , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[48]  Zhang Shichao,et al.  One For All: A Mutual Enhancement Method for Object Detection and Semantic Segmentation , 2019 .

[49]  Vincent Frémont,et al.  Exploiting fully convolutional neural networks for fast road detection , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[50]  Roberto Cipolla,et al.  MultiNet: Real-time Joint Semantic Reasoning for Autonomous Driving , 2016, 2018 IEEE Intelligent Vehicles Symposium (IV).

[51]  Simon Malinowski,et al.  Combining convolutional side-outputs for road image segmentation , 2019, 2019 International Joint Conference on Neural Networks (IJCNN).

[52]  Gang Yu,et al.  BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation , 2018, ECCV.

[53]  Gen Li,et al.  DABNet: Depth-wise Asymmetric Bottleneck for Real-time Semantic Segmentation , 2019, BMVC.