Weakly Supervised Person Re-Identification

In the conventional person re-id setting, it is assumed that the labeled images are the person images within the bounding box for each individual; this labeling across multiple nonoverlapping camera views from raw video surveillance is costly and time-consuming. To overcome this difficulty, we consider weakly supervised person re-id modeling. The weak setting refers to matching a target person with an untrimmed gallery video where we only know that the identity appears in the video without the requirement of annotating the identity in any frame of the video during the training procedure. Hence, for a video, there could be multiple video-level labels. We cast this weakly supervised person re-id challenge into a multi-instance multi-label learning (MIML) problem. In particular, we develop a Cross-View MIML (CV-MIML) method that is able to explore potential intraclass person images from all the camera views by incorporating the intra-bag alignment and the cross-view bag alignment. Finally, the CV-MIML method is embedded into an existing deep neural network for developing the Deep Cross-View MIML (Deep CV-MIML) model. We have performed extensive experiments to show the feasibility of the proposed weakly supervised setting and verify the effectiveness of our method compared to related methods on four weakly labeled datasets.

[1]  Yu Wu,et al.  Exploit the Unknown Gradually: One-Shot Video-Based Person Re-identification by Stepwise Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[2]  Zhi-Hua Zhou,et al.  Discover Multiple Novel Labels in Multi-Instance Multi-Label Learning , 2017, AAAI.

[3]  Zhi-Hua Zhou,et al.  Multi-instance multi-label learning , 2008, Artif. Intell..

[4]  Xiaogang Wang,et al.  DeepReID: Deep Filter Pairing Neural Network for Person Re-identification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Jing Xu,et al.  Attention-Aware Compositional Network for Person Re-identification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6]  Xiaogang Wang,et al.  Diversity Regularized Spatiotemporal Attention for Video-Based Person Re-identification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[7]  Liang Wang,et al.  Mask-Guided Contrastive Attention Model for Person Re-identification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[8]  Xiaoli Z. Fern,et al.  Multi-instance multi-label learning in the presence of novel class instances , 2015, ICML.

[9]  Shaogang Gong,et al.  Unsupervised Person Re-identification by Deep Learning Tracklet Association , 2018, ECCV.

[10]  Shaogang Gong,et al.  Deep Association Learning for Unsupervised Video Person Re-identification , 2018, BMVC.

[11]  Nanning Zheng,et al.  Deep self-paced learning for person re-identification , 2017, Pattern Recognit..

[12]  Vasant Honavar,et al.  Multi-Instance Multi-Label Learning for Image Classification with Large Vocabularies , 2011, BMVC.

[13]  Zhihui Li,et al.  Deep Feature Learning via Structured Graph Laplacian Embedding for Person Re-Identification , 2017, Pattern Recognit..

[14]  Jianfei Cai,et al.  MIML-FCN+: Multi-Instance Multi-Label Learning via Fully Convolutional Networks with Privileged Information , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Xiaoli Z. Fern,et al.  Instance Annotation for Multi-Instance Multi-Label Learning , 2013, TKDD.

[16]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Francesco Solera,et al.  Performance Measures and a Data Set for Multi-target, Multi-camera Tracking , 2016, ECCV Workshops.

[18]  Pong C. Yuen,et al.  Dynamic Label Graph Matching for Unsupervised Video Re-identification , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[19]  Xiaogang Wang,et al.  End-to-End Deep Learning for Person Search , 2016, ArXiv.

[20]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[21]  Rita Cucchiara,et al.  3DPeS: 3D people dataset for surveillance and forensics , 2011, J-HGBU '11.

[22]  Horst Bischof,et al.  Person Re-identification by Descriptive and Discriminative Classification , 2011, SCIA.

[23]  Ji Feng,et al.  Deep MIML Network , 2017, AAAI.

[24]  Xiaogang Wang,et al.  Locally Aligned Feature Transforms across Views , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Hai Tao,et al.  Evaluating Appearance Models for Recognition, Reacquisition, and Tracking , 2007 .

[26]  Lin Wu,et al.  Deep adaptive feature embedding with local sample distributions for person re-identification , 2017, Pattern Recognit..

[27]  Zhi-Hua Zhou,et al.  Towards Discovering What Patterns Trigger What Labels , 2012, AAAI.

[28]  Zhi-Hua Zhou,et al.  Ensemble multi-instance multi-label learning approach for video annotation task , 2011, MM '11.

[29]  Xiaoli Z. Fern,et al.  Context-Aware MIML Instance Annotation , 2013, 2013 IEEE 13th International Conference on Data Mining.

[30]  Zheru Chi,et al.  Multi-instance multi-label image classification: A neural approach , 2013, Neurocomputing.

[31]  Xiao-Yuan Jing,et al.  Video-Based Person Re-Identification by Simultaneously Learning Intra-Video and Inter-Video Distance Metrics , 2016, IEEE Transactions on Image Processing.

[32]  Jian-Huang Lai,et al.  Person Re-Identification by Camera Correlation Aware Feature Augmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Huchuan Lu,et al.  Stepwise Metric Promotion for Unsupervised Video Person Re-identification , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[34]  Masayuki Mukunoki,et al.  Shinpuhkan2014: A Multi-Camera Pedestrian Dataset for Tracking People across Multiple Cameras , 2014 .

[35]  Zhi-Hua Zhou,et al.  Multi-Modal Image Annotation with Multi-Instance Multi-Label LDA , 2013, IJCAI.

[36]  Timo Aila,et al.  Temporal Ensembling for Semi-Supervised Learning , 2016, ICLR.

[37]  Qi Tian,et al.  Beyond Part Models: Person Retrieval with Refined Part Pooling , 2017, ECCV.

[38]  Qi Tian,et al.  MARS: A Video Benchmark for Large-Scale Person Re-Identification , 2016, ECCV.

[39]  Yi Yang,et al.  Unsupervised Person Re-identification , 2018, ACM Trans. Multim. Comput. Commun. Appl..

[40]  Shaogang Gong,et al.  Person Re-identification by Video Ranking , 2014, ECCV.

[41]  Tsukasa Ogasawara,et al.  Temporal-Enhanced Convolutional Network for Person Re-Identification , 2018, AAAI.

[42]  Shaogang Gong,et al.  Person Re-Identification by Unsupervised Video Matching , 2016, Pattern Recognit..

[43]  Tao Mei,et al.  Joint multi-label multi-instance learning for image classification , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.