Combination of computer vision detection and segmentation for autonomous driving

Most existing deep learning networks for computer vision attempt to improve the performance of either semantic segmentation or object detection. This study develops a unified network architecture that uses both semantic segmentation and object detection to detect people, cars, and roads simultaneously. To achieve this goal, we create an environment in the Unity engine as our dataset. We train our proposed unified network that combines segmentation and detection approaches with the simulation dataset. The proposed network can perform end-to-end prediction and performs well on the tested dataset. The proposed approach is also efficient, processing each image in about 30 ms on an NVIDIA GTX 1070.

[1]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[2]  Christoph H. Lampert,et al.  Beyond sliding windows: Object localization by efficient subwindow search , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[4]  Graham W. Taylor,et al.  Deconvolutional networks , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[5]  Vibhav Vineet,et al.  Conditional Random Fields as Recurrent Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[6]  한보형,et al.  Learning Deconvolution Network for Semantic Segmentation , 2015 .

[7]  Luc Van Gool,et al.  The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.

[8]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[9]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[10]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).