Small Traffic Sign Detection Through Selective Feature Fusion Based Faster R-CNN With Arc-Softmax Loss

Traffic signs are basic and important elements in maps. They are related to traffic regulations, profoundly affecting/managing the travel mode of human beings and efficiency of vehicle running. Traffic sign mining technology is applied in many research fields such as traditional map update, high-precision map establishment and automatic driving. Image based traffic sign identification technology has the advantages of low cost and high efficiency over manual processing mode, and traffic sign detection has thus become a significant task with the pacing advancement of autonomous driving. However, many common object detection methods cannot be directly applied to this task, as the size of traffic signs are very small yet they vary considerably. Due to such characteristics, features of traffic signs are difficult to capture, and are harder to discriminate between classes. To address this problem, we proposed a selective feature fusion based Faster R-CNN with Arc-Softmax loss, which optimizes the detection performance from the two following ways: network structure and loss function. We discover that each Faster R-CNN layer is only capable of detecting targets within a certain size range. By carefully selecting and combining different layers' feature maps, we can extract features that effectively represent traffic signs of various sizes. Then, Arc-Softmax loss penalizes the angular distances between the feature vectors of different signs, and their corresponding weight vectors of the last fully connected layers, thereby encouraging intra-class compactness and inter-class separability between learned features. Extensive analysis and experiments on the challenging Tsinghua-Tencent 100K benchmark demonstrate the superiority and implementation simplicity of our proposed method. Code will be made publicly available.

[1]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[2]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[4]  Nojun Kwak,et al.  Enhancement of SSD by concatenating feature maps for object detection , 2017, BMVC.

[5]  Jitendra Malik,et al.  Beyond Skip Connections: Top-Down Modulation for Object Detection , 2016, ArXiv.

[6]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Kaiming He,et al.  Focal Loss for Dense Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[8]  Yu Liu,et al.  Learning Deep Features via Congenerous Cosine Loss for Person Recognition , 2017, ArXiv.

[9]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[10]  Baoli Li,et al.  Traffic-Sign Detection and Classification in the Wild , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Meng Yang,et al.  Large-Margin Softmax Loss for Convolutional Neural Networks , 2016, ICML.

[12]  Xu Tang,et al.  PyramidBox: A Context-assisted Single Shot Face Detector , 2018, ECCV.

[13]  Wei Liu,et al.  DSSD : Deconvolutional Single Shot Detector , 2017, ArXiv.

[14]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[15]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[16]  Min Chen,et al.  Detecting Small Signs from Large Images , 2017, 2017 IEEE International Conference on Information Reuse and Integration (IRI).

[17]  Xing Ji,et al.  CosFace: Large Margin Cosine Loss for Deep Face Recognition , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[18]  Xiang Zhang,et al.  OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[19]  Yu Qiao,et al.  A Discriminative Feature Learning Approach for Deep Face Recognition , 2016, ECCV.

[20]  Kavita Bala,et al.  Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Yunchao Wei,et al.  Perceptual Generative Adversarial Networks for Small Object Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Dumitru Erhan,et al.  Scalable Object Detection Using Deep Neural Networks , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Ruslan Salakhutdinov,et al.  Probabilistic Matrix Factorization , 2007, NIPS.

[24]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[25]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[26]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Peiyun Hu,et al.  Finding Tiny Faces , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[30]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[31]  Huimin Ma,et al.  3D Object Proposals for Accurate Object Class Detection , 2015, NIPS.

[32]  Fan Yang,et al.  Exploit All the Layers: Fast and Accurate CNN Object Detector with Scale Dependent Pooling and Cascaded Rejection Classifiers , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).