Design of lightweight pedestrian detection network in railway scenes

The deployment of many advanced pedestrian detection applications is largely hindered by the high computational cost of deep convolutional neural networks (CNNs). In this paper, we propose a two-step pruning method to design a lightweight pedestrian detection network in railway scenes. The first step is feature pyramid network (FPN) pruning, which utilizes the characteristic of pedestrian in railway scenes and the FPN structure in YOLOv3. The second step is regular channel pruning, which utilizes network slimming knowledge and is an accelerator-friendly pruning strategy. Our two-step pruning method gives about 88% reduction in parameters and about 74% reduction in computing complexity with comparable detection accuracy.

[1]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[3]  Oussama Khatib,et al.  Springer Handbook of Robotics , 2007, Springer Handbooks.

[4]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Xinyu Zhang,et al.  An Extended Filtered Channel Framework for Pedestrian Detection , 2018, IEEE Transactions on Intelligent Transportation Systems.

[7]  Hanan Samet,et al.  Pruning Filters for Efficient ConvNets , 2016, ICLR.

[8]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[9]  Kaiming He,et al.  Focal Loss for Dense Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[10]  Chong-Min Kyung,et al.  A Low-Complexity Pedestrian Detection Framework for Smart Video Surveillance Systems , 2017, IEEE Transactions on Circuits and Systems for Video Technology.

[11]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[12]  Zhiqiang Shen,et al.  Learning Efficient Convolutional Networks through Network Slimming , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[13]  Shuicheng Yan,et al.  Scale-Aware Fast R-CNN for Pedestrian Detection , 2015, IEEE Transactions on Multimedia.

[14]  Hei Law,et al.  CornerNet: Detecting Objects as Paired Keypoints , 2018, ECCV.

[15]  Hui Zhou,et al.  Pedestrian Detection via Body Part Semantic and Contextual Information With DNN , 2018, IEEE Transactions on Multimedia.

[16]  Qi Tian,et al.  CenterNet: Keypoint Triplets for Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[17]  Christian Laugier,et al.  Intelligent Vehicles , 2016, Springer Handbook of Robotics, 2nd Ed..

[18]  Yoshua Bengio,et al.  BinaryNet: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1 , 2016, ArXiv.

[19]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  R. Venkatesh Babu,et al.  Training Sparse Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[21]  Ramesh Raskar,et al.  Designing Neural Network Architectures using Reinforcement Learning , 2016, ICLR.