Instance Enhancing Loss: Deep Identity-Sensitive Feature Embedding for Person Search

Person search, which is vital for intelligent surveillance, aims at detecting and re-identifying pedestrians from whole monitoring images. However, due to the inaccurate pedestrian detections and extremely few instances per training identity, it remains challenging to learn discriminative representations only by labeled identities for person search. To this end, this paper proposes a novel loss function called instance enhancing loss (IEL) to learn deep identity-sensitive features by introducing unlabeled identity information. Specifically, the proposed IEL can selectively annotate unlabeled identities with similar appearances to labeled identities, and utilize these unlabeled identities in conjunction with labeled identities to train the person search network. The amount of unlabeled identities used as labeled instances can be quantitatively adjusted. Moreover, the proposed IEL is trainable and easy to optimize by back propagation algorithms. Extensive experiments on two benchmark datasets, namely CUHK-SYSU and PRW, show that our method outperforms state-of-the-arts for person search.

[1]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Hong Liu,et al.  Enhanced skeleton visualization for view invariant human action recognition , 2017, Pattern Recognit..

[4]  Qi Tian,et al.  Person Re-identification in the Wild , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Bingpeng Ma,et al.  Person Search in a Scene by Jointly Modeling People Commonness and Person Uniqueness , 2014, ACM Multimedia.

[6]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[7]  Geoffrey E. Hinton,et al.  On the importance of initialization and momentum in deep learning , 2013, ICML.

[8]  Hong Liu,et al.  3D Action Recognition Using Multiscale Energy-Based Global Ternary Image , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[9]  Hong Liu,et al.  Body structure based triplet Convolutional Neural Network for person re-identification , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[10]  Xiaogang Wang,et al.  End-to-End Deep Learning for Person Search , 2016, ArXiv.

[11]  Pietro Perona,et al.  Fast Feature Pyramids for Object Detection , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Mohamed Lamine Mekhalfi,et al.  Person re-identification by order-induced metric fusion , 2018, Neurocomputing.

[13]  Hong Liu,et al.  LPCV: Learning projections from corresponding views for person re-identification , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[14]  W. Eric L. Grimson,et al.  Adaptive background mixture models for real-time tracking , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[15]  Xiaogang Wang,et al.  Joint Detection and Identification Feature Learning for Person Search , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Bingpeng Ma,et al.  Local Descriptors Encoded by Fisher Vectors for Person Re-identification , 2012, ECCV Workshops.

[17]  Shaogang Gong,et al.  Deep learning prototype domains for person re-identification , 2016, 2017 IEEE International Conference on Image Processing (ICIP).

[18]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[19]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[20]  Xiaogang Wang,et al.  Intelligent multi-camera video surveillance: A review , 2013, Pattern Recognit. Lett..

[21]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.