Feature Selective Small Object Detection via Knowledge-based Recurrent Attentive Neural Network

At present, the performance of deep neural network in general object detection is comparable to or even surpasses that of human beings. However, due to the limitations of deep learning itself, the small proportion of feature pixels, and the occurence of blur and occlusion, the detection of small objects in complex scenes is still an open question. But we can not deny that real-time and accurate object detection is fundamental to automatic perception and subsequent perception-based decision-making and planning tasks of autonomous driving. Considering the characteristics of small objects in autonomous driving scene, we proposed a novel method named KB-RANN, which based on domain knowledge, intuitive experience and feature attentive selection. It can focus on particular parts of image features, and then it tries to stress the importance of these features and strengthenes the learning parameters of them. Our comparative experiments on KITTI and COCO datasets show that our proposed method can achieve considerable results both in speed and accuracy, and can improve the effect of small object detection through self-selection of important features and continuous enhancement of proposed method, and deployed it in our self-developed autonomous driving car.

[1]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Marios Savvides,et al.  Feature Selective Anchor-Free Module for Single-Shot Object Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Nanning Zheng,et al.  Cognition-Based Deep Learning: Progresses and Perspectives , 2018, AIAI.

[4]  Rita Cucchiara,et al.  Predicting Human Eye Fixations via an LSTM-Based Saliency Attentive Model , 2016, IEEE Transactions on Image Processing.

[5]  Wei Liu,et al.  DSSD : Deconvolutional Single Shot Detector , 2017, ArXiv.

[6]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Forrest N. Iandola,et al.  SqueezeDet: Unified, Small, Low Power Fully Convolutional Neural Networks for Real-Time Object Detection for Autonomous Driving , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[8]  Stefano Ermon,et al.  Label-Free Supervision of Neural Networks with Physics and Domain Knowledge , 2016, AAAI.

[9]  Silvio Savarese,et al.  Subcategory-Aware Convolutional Neural Networks for Object Proposals and Detection , 2016, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[10]  Rogério Schmidt Feris,et al.  A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection , 2016, ECCV.

[11]  Gang Wang,et al.  Recurrent Attentional Networks for Saliency Detection , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Quoc V. Le,et al.  Listen, attend and spell: A neural network for large vocabulary conversational speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[13]  Forrest N. Iandola,et al.  SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.

[14]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Peng Wang,et al.  Ask Me Anything: Free-Form Visual Question Answering Based on Knowledge from External Sources , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[19]  Trevor Darrell,et al.  Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[21]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[22]  A. Mtibaa,et al.  Automatic detection and recognition of road sign for driver assistance system , 2012, 2012 16th IEEE Mediterranean Electrotechnical Conference.

[23]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[24]  Xiaohong W. Gao,et al.  Recognition of traffic signs based on their colour and shape features extracted using human vision models , 2006, J. Vis. Commun. Image Represent..

[25]  Saturnino Maldonado-Bascón,et al.  Knowledge Modeling for the Traffic Sign Recognition Task , 2005, IWINAC.

[26]  Visvanathan Ramesh,et al.  A system for traffic sign detection, tracking, and recognition using color, shape, and motion information , 2005, IEEE Proceedings. Intelligent Vehicles Symposium, 2005..

[27]  Giovanni Pilato,et al.  Road signs recognition using a dynamic pixel aggregation technique in the HSV color space , 2001, Proceedings 11th International Conference on Image Analysis and Processing.

[28]  Jean-Luc Starck,et al.  A combined approach for object detection and deconvolution , 2000, Astronomy and Astrophysics Supplement Series.

[29]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[30]  Lutz Priese,et al.  On hierarchical color segmentation and applications , 1993, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.