SADet: Learning An Efficient and Accurate Pedestrian Detector

Although the anchor-based detectors have taken a big step forward in pedestrian detection, the overall performance of algorithm still needs further improvement for practical applications, \emph{e.g.}, a good trade-off between the accuracy and efficiency. To this end, this paper proposes a series of systematic optimization strategies for the detection pipeline of one-stage detector, forming a single shot anchor-based detector (SADet) for efficient and accurate pedestrian detection, which includes three main improvements. Firstly, we optimize the sample generation process by assigning soft tags to the outlier samples to generate semi-positive samples with continuous tag value between $0$ and $1$, which not only produces more valid samples, but also strengthens the robustness of the model. Secondly, a novel Center-$IoU$ loss is applied as a new regression loss for bounding box regression, which not only retains the good characteristics of IoU loss, but also solves some defects of it. Thirdly, we also design Cosine-NMS for the postprocess of predicted bounding boxes, and further propose adaptive anchor matching to enable the model to adaptively match the anchor boxes to full or visible bounding boxes according to the degree of occlusion, making the NMS and anchor matching algorithms more suitable for occluded pedestrian detection. Though structurally simple, it presents state-of-the-art result and real-time speed of $20$ FPS for VGA-resolution images ($640 \times 480$) on challenging pedestrian detection benchmarks, i.e., CityPersons, Caltech, and human detection benchmark CrowdHuman, leading to a new attractive pedestrian detector.

[1]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[2]  Feiping Nie,et al.  Robust Distance Metric Learning via Simultaneous L1-Norm Minimization and Maximization , 2014, ICML.

[3]  Shengcai Liao,et al.  Unsupervised Graph Association for Person Re-Identification , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[4]  Bernt Schiele,et al.  CityPersons: A Diverse Dataset for Pedestrian Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Rogério Schmidt Feris,et al.  A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection , 2016, ECCV.

[7]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Wei Liu,et al.  High-Level Semantic Feature Detection: A New Perspective for Pedestrian Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[10]  Mei-Chen Yeh,et al.  Fast Human Detection Using a Cascade of Histograms of Oriented Gradients , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[11]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[12]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machine Classifiers , 1999, Neural Processing Letters.

[13]  Silvio Savarese,et al.  Generalized Intersection Over Union: A Metric and a Loss for Bounding Box Regression , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Fahad Shahbaz Khan,et al.  Mask-Guided Attention Network for Occluded Pedestrian Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[16]  Junjie Yan,et al.  The Fastest Deformable Part Model for Object Detection , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Yi Yang,et al.  DenseBox: Unifying Landmark Localization with End to End Object Detection , 2015, ArXiv.

[18]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[19]  Luc Van Gool,et al.  Seeking the Strongest Rigid Detector , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Bin Yang,et al.  Convolutional Channel Features , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[21]  Yuning Jiang,et al.  UnitBox: An Advanced Object Detection Network , 2016, ACM Multimedia.

[22]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[23]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[24]  Nuno Vasconcelos,et al.  Learning Complexity-Aware Cascades for Deep Pedestrian Detection , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[25]  Zhaohui Zheng,et al.  Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression , 2019, AAAI.

[26]  Bernt Schiele,et al.  Filtered channel features for pedestrian detection , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Liang Lin,et al.  Is Faster R-CNN Doing Well for Pedestrian Detection? , 2016, ECCV.

[28]  Pietro Perona,et al.  Integral Channel Features , 2009, BMVC.

[29]  Jungwon Lee,et al.  Fused DNN: A Deep Neural Network Fusion Approach to Fast and Robust Pedestrian Detection , 2016, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[30]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[31]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Muhammad Younus Javed,et al.  Multi-level features fusion and selection for human gait recognition: an optimized framework of Bayesian model and binomial distribution , 2019, Int. J. Mach. Learn. Cybern..

[33]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[34]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[35]  Qi Tian,et al.  CenterNet: Keypoint Triplets for Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[36]  Shiliang Pu,et al.  Small-Scale Pedestrian Detection Based on Topological Line Localization and Temporal Feature Aggregation , 2018, ECCV.

[37]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Junjie Yan,et al.  Multi-pedestrian detection in crowded scenes: A global view , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[39]  David A. McAllester,et al.  A discriminatively trained, multiscale, deformable part model , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[40]  Larry S. Davis,et al.  Soft-NMS — Improving Object Detection with One Line of Code , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[41]  Xiangyu Zhang,et al.  CrowdHuman: A Benchmark for Detecting Human in a Crowd , 2018, ArXiv.

[42]  Yuning Jiang,et al.  Repulsion Loss: Detecting Pedestrians in a Crowd , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[43]  Stan Z. Li,et al.  FLDet: A CPU Real-time Joint Face and Landmark Detector , 2019, 2019 International Conference on Biometrics (ICB).

[44]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Arthur Daniel Costea,et al.  Semantic Channels for Fast Pedestrian Detection , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Shifeng Zhang,et al.  Occlusion-aware R-CNN: Detecting Pedestrians in a Crowd , 2018, ECCV.

[47]  Anton van den Hengel,et al.  Strengthening the Effectiveness of Pedestrian Detection with Spatially Pooled Features , 2014, ECCV.

[48]  Shaogang Gong,et al.  Harmonious Attention Network for Person Re-identification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[49]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Xiaogang Wang,et al.  Pedestrian detection aided by deep learning semantic tasks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Sven Kosub,et al.  A note on the triangle inequality for the Jaccard distance , 2016, Pattern Recognit. Lett..

[52]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[53]  Wu Liu,et al.  A discriminative null space based deep learning approach for person re-identification , 2016, 2016 4th International Conference on Cloud Computing and Intelligence Systems (CCIS).

[54]  Shuicheng Yan,et al.  Scale-Aware Fast R-CNN for Pedestrian Detection , 2015, IEEE Transactions on Multimedia.

[55]  Joon Hee Han,et al.  Local Decorrelation For Improved Pedestrian Detection , 2014, NIPS.

[56]  Pietro Perona,et al.  Pedestrian Detection: An Evaluation of the State of the Art , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[57]  Hei Law,et al.  CornerNet: Detecting Objects as Paired Keypoints , 2018, ECCV.

[58]  Xiaogang Wang,et al.  Deep Learning Strong Parts for Pedestrian Detection , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[59]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[60]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[61]  Shifeng Zhang,et al.  S^3FD: Single Shot Scale-Invariant Face Detector , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[62]  Wei Liu,et al.  Learning Efficient Single-Stage Pedestrian Detectors by Asymptotic Localization Fitting , 2018, ECCV.

[63]  Yuning Jiang,et al.  What Can Help Pedestrian Detection? , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[64]  Yang Li,et al.  Attribute-aware Pedestrian Detection in a Crowd , 2019 .

[65]  Yunhong Wang,et al.  Adaptive NMS: Refining Pedestrian Detection in a Crowd , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[66]  Pietro Perona,et al.  Fast Feature Pyramids for Object Detection , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.