Improved Model Structure with Cosine Margin OIM Loss for End-to-End Person Search

End-to-end person search is a novel task that integrates pedestrian detection and person re-identification (re-ID) into a joint optimization framework. However, the pedestrian features learned by most existing methods are not discriminative enough due to the potential adverse interaction between detection and re-ID tasks and the lack of discriminative power of re-ID loss. To this end, we propose an Improved Model Structure (IMS) with a novel re-ID loss function called Cosine Margin Online Instance Matching (CM-OIM) loss. Firstly, we design a model structure more suitable for person search, which alleviates the adverse interaction between the detection and re-ID parts by reasonably decreasing the network layers shared by them. Then, we conduct a full investigation of the weight of re-ID loss, which we argue plays an important role in end-to-end person search models. Finally, we improve the Online Instance Matching (OIM) loss by adopting a more robust online update strategy, and importing a cosine margin into it to increase the intra-class compactness of the features learned. Extensive experiments on two challenging datasets CUHK-SYSU and PRW demonstrate our approach outperforms the state-of-the-arts.

[1]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Jian Yang,et al.  Person Search via A Mask-Guided Two-Stream CNN Model , 2018, ECCV.

[3]  Roberto Cipolla,et al.  Multi-task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[4]  Shengcai Liao,et al.  Person re-identification by Local Maximal Occurrence representation and metric learning , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Xiaogang Wang,et al.  Joint Detection and Identification Feature Learning for Person Search , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Qi Tian,et al.  Person Re-identification in the Wild , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Bingpeng Ma,et al.  Person Search in a Scene by Jointly Modeling People Commonness and Person Uniqueness , 2014, ACM Multimedia.

[8]  Horst Bischof,et al.  Large scale metric learning from equivalence constraints , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Shaogang Gong,et al.  Multi-camera activity correlation analysis , 2009, CVPR.

[10]  Xiaogang Wang,et al.  Intelligent multi-camera video surveillance: A review , 2013, Pattern Recognit. Lett..

[11]  Yunchao Wei,et al.  IAN: The Individual Aggregation Network for Person Search , 2017, Pattern Recognit..

[12]  Xiaogang Wang,et al.  Unsupervised Salience Learning for Person Re-identification , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Xiang Li,et al.  An enhanced deep feature representation for person re-identification , 2016, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[14]  Rob Fergus,et al.  Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-scale Convolutional Architecture , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[15]  Jian Cheng,et al.  Additive Margin Softmax for Face Verification , 2018, IEEE Signal Processing Letters.

[16]  Hong Liu,et al.  A Discriminatively Learned Feature Embedding Based on Multi-Loss Fusion For Person Search , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[17]  Meng Yang,et al.  Large-Margin Softmax Loss for Convolutional Neural Networks , 2016, ICML.

[18]  Yi Yang,et al.  Harry Potter's Marauder's Map: Localizing and Tracking Multiple Persons-of-Interest by Nonnegative Discretization , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Hong Liu,et al.  Instance Enhancing Loss: Deep Identity-Sensitive Feature Embedding for Person Search , 2018, 2018 25th IEEE International Conference on Image Processing (ICIP).

[20]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Yu Qiao,et al.  A Discriminative Feature Learning Approach for Deep Face Recognition , 2016, ECCV.

[22]  Bo Zhao,et al.  Neural Person Search Machines , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[23]  Xiang Zhang,et al.  OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[24]  Qi Tian,et al.  Scalable Person Re-identification: A Benchmark , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[25]  Bhiksha Raj,et al.  SphereFace: Deep Hypersphere Embedding for Face Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).